Hybrid Beamforming With Deep Learning for Large-Scale Antenna Arrays

The emergence of highly directional beamforming technology makes millimeter wave frequency band communication possible in future wireless communication networks. Based on the multipath characteristics of millimeter wave frequency communication, a high-precision multipath channel estimation algorithm based on signal subspace is proposed. In the mobile terminal, an iterative heuristic radio frequency combination algorithm based on spatial points is proposed. The analog precoding at the base station uses deep learning to accelerate the calculation, and then the multi-user communication is modeled to design the digital precoding. The simulation results show that the multi-channel estimation algorithm can estimate 4 paths with an error of no more than 0.3 rad. The proposed DL algorithm takes only 20% of the time when it is close to the 87% spectral efficiency of the traditional algorithm.


I. INTRODUCTION
With the rapid evolution and upgrading of wireless networks, the amount of new mobile service data and the number of users has shown an exponential growth trend, and the demand for mobile data traffic has also shown explosive growth [1], [2]. To meet the demand of large-capacity, high-speed and low-latency services, the communication frequency band is rapidly expanding to the millimeter wave [3]. The bandwidth of a millimeter wave communication system can reach to 800 MHz, and the pertinent communication rate is able to achieve 10Gbps, which can definitely meet the requirements of the International Telecommunication Union (ITU) for 5G communication systems [4], [5]. To make the above goals true, various beamforming technologies are proposed in the last few years.
Traditionally, the early beamforming technologies digitally process the signal on the baseband via phase and amplitude manipulating [6]. However, the large antenna array used in the millimeter wave band is not suitable for this solution due to dedicated baseband and RF hardware [7], which will bring excessive power consumption and cost, thus impeding the large-scale deployment for 5G equipment [8]- [10].
The associate editor coordinating the review of this manuscript and approving it for publication was Yiming Huo .
In order to overcome the limitation of the number of RF chains, many hybrid beamforming algorithms combining digital precoding and analog precoding have been proposed. Namely, the signal is advanced through the digital baseband precoding and then passed through the phase shifter to form a radio frequency chain. In [11], the previous hybrid precoding problem is transformed to an optimization problem by minimizing the Euclidean distance between the hybrid precoder and the optimal precoder, and then an orthogonal matching pursuit (OMP) algorithm is proposed to perform the spatial sparse precoding. [12] used the alternating minimization method to solve the hybrid precoding problem, and simultaneously proposed three algorithms for millimeter wave communication systems. While [13] proposed a two-stage hybrid beamforming algorithm, which uses spatial point iteration to increase cross-entropy. Even though the simulation results show good performance, the computational cost is too expensive to be suitable for base stations with many connections and simultaneously demanding low latency.
On the other hand, the latest research on intelligent communication shows that data-based deep learning (DL) methods have great potential in solving traditional challenging problems. Various DL based algorithms have been proposed recently. For instance, [14] advised to iteratively optimize the autoencoders either at the receiver or the transmitter ends, where the receiver uses supervised learning and then the transmitter uses reinforcement learning. It does not need to know the channel gradient and thereby can be directly trained in the actual channel. As an alternative, [15] combines deep learning with traditional communication theories, based on some existing traditional communication theories, and aims to simplify the network model and speed up the training. However, for both communication parties, this deep network based method has poor robustness to complex environments.
To overcoming the flaws and limitations of the conventional beamforming methods for large-scale antenna arrays, this paper presents a novel hybrid beamforming method on the basis of DL by firstly converting the analog precoding design problem into a mathematical maximum value problem on the mobile terminal according to the maximum cross entropy formula, then a heuristic algorithm is proposed, and finally the design of the digital baseband precoding is implemented according to the minimum mean square error (MMSE) criterion. At the base station, in order to simplify the calculation, the analog precoding adopts a deep learning algorithm to train the network by employing the gain and the loss information fed back by the users. Then, in regard to the requirement that the base station needs to synchronously communicate with multiple users, a novel digital precoding algorithm based on the orthogonality of multi-user signals is designed. The simulation results show that the multi-channel estimation algorithm can estimate 4 paths with an error strictly controlled under 0.3 rad. Besides, the proposed DL algorithm takes only 20% of the time but almost can achieve 87% spectral efficiency of the traditional algorithm.
The arrangement of the rest of this article is as follows. The second section introduces the mathematical modeling of the antenna array channel in detail. The third section uses the rotation invariance of the antenna sub-array signal subspace and the orthogonal characteristics of the signal space and the noise space to estimate the multi-channel path. Section IV introduces the iterative hybrid precoding at the base station, the fifth section introduces the mobile hybrid coding based on deep learning, and the sixth section gives the simulation results.

II. SYSTEM MODEL
Consider a single-cell ultra-multi-antenna multi-user distributed antenna system. The base station is comprised of N BS antennas, and N RF t transmitting radio frequency chains, as well as K serving users. Each user is equipped with N MS antennas and N RF r receiving radio frequency chains [16]. The number of data streams required by each user is f , and meets Kf ≤ N RF t ≤ N BS and f ≤ N RF r ≤ N MS . (Here BS and MS denote the base station and mobile station, respectively.) To have a better illustration, a MIMO millimeter wave system is shown in Fig. 1, where the BS uses array antennas, FIGURE 1. Diagram of an MIMO mmWave system. and the channel model of the k th user can be expressed as [4].
where l represents the same signal arrival path, and other terms will be explained in the following: a MS represents the array response vector of the mobile terminal. In the millimeter wave frequency band, signal scattering is very significant, thus multiple path effects have to be included, then we have the following expression, On the BS side, due to the use of array antennas, both the direction angle φ bs l and the elevation angle θ bs l need to be considered. Thus, the channel vector should be written as: where p = cos(φ bs l ) cos(θ bs l ), q = sin(φ bs l ) cos(θ bs l ); M and N denote the number of horizontal and vertical antennas of the array antenna and meet MN = N BS , respectively; λ and d represent the signal wavelength and the antenna spacing, respectively.
For the channel matrix A MS at the MS, it is given as where diag(α 1 , α 2 . . . α L ) is the diagonal matrix of complex gains for L paths, and the path gains follow the Rayleigh distribution and to construct the final transmitted signal. Consequently, the transmitted signal at the BS can be written as , and s is the concatenation vector of multiple user information streams. As a result, the signal received by the k th user can be expressed as where s k is information stream of the k th user, and n k ∈ CN (0, σ 2 I k ) is the Gaussian white noise of the k th user environment. At the user, the analog phase shifter is first down-converted by utilizing N ms × N RF t RF chains W RF with constraint |W RF (i, j)| 2 = 1, and then digital baseband processing is performed using N RF t × f digital precoding W D , and lastly the final signal can be expressed as 8: The function of precoding is to maximize the spectrum transmission efficiency of each user, i.e., the signal x received by the user is the same as the transmitted signal s as far as possible.
For such a system, the total spectral efficiency of the k th user under the Gaussian signal assumption can be expressed as: is the covariance of the interference plus noise at user k.
For millimeter wave beamforming design, it is necessary to find the desired (W D , W RF , V RF , V D ) to maximize the spectral efficiency of each user. We tried to design a hybrid precoder both at the BS and the MS to make full use of the Gaussian signal to achieve mutual information in millimeter waves. However, the solution to (9) is very complicated. Instead, we proposed to divide the whole design into two subproblems. Namely, BS precoding remains unchanged when MS designs precoding, and then MS remains unchanged when BS designs precoding. After several iterations of the above process, the precoding results of both sides can be obtained.

III. CHANNEL ESTIMATION OF MS
In millimeter wave band communication, how to allow the terminal to quickly capture the millimeter wave beam and complete reliable beam tracking in the mobile environment is a major challenge [11]. The attenuation effect of the millimeter wave signal is particularly obvious, and thus the signal must be confined to be transmitted in one direction for longer communication. However, for two parties that have not established a communication connection, they are blind to each other. In order to accurately get the position information of both parties, a more general method is to continuously transmit a pilot signal from the BS side, and the MS end quickly determines the position of the BS according to the received pilot signal information, and then the communication with the BS is conveniently established. Therefore, the design of the pilot signal at the BS end is particularly critical, which has a big impact on the efficiency of the two parties in establishing communications.

A. PILOT SIGNAL DESIGN OF BS
As an array antenna, the BS end has two pivotal variables: directional angle and elevation angle. Both φ and θ can be divided into 4 parts with (φ l , θ l ) ∈ { π 2 , π, 3π 2 , 2π }, thereby there are 16 kinds of space combinations. Considering that the BS has the constraint of total power, which can be transmitted along one direction every time, and every 16 transmissions is a cycle. BS pilot signal design can be considered as: whereȳ ∈ R 1×16 , [a H BS ] l is the BS channel vector in the l th direction, V is the BS precoding, which shapes the signal beam along the l th direction, and s is designed to satisfy the following condition given by In this vector, the l th element is 1, while others are zero. After the MS estimates the general information of the channel and feeds back the signal path to the BS, the BS can further refine and adjust the direction of the pilot alignment according to the feedback information. After several rounds of BS and MS communication, both parties have an accurate estimate of the communication path.

B. THE CHANNEL ESTIMATION AT MS
With the aim to estimate the complex gain and the direction angle of the transmitted multiple paths based on the received signal, the signal arrival angle should be evaluated by the MS firstly. Suppose the MS has N MS antennas, which are further grouped into two overlapping sub-arrays. Specifically, the first (N MS − 1) antennas are labeled as sub-array 1, while the last (N MS − 1) antennas fall into sub-array 2. Moreover, it is supposed that subarray 1 moves by d to coincide with subarray 2, and the L statistically independent narrowband signals s l (t), l = 1, 2 . . . L working at the same frequency are incident on the array from azimuthal angle φ l , l = 1, 2 . . . L. Let X and Y denote the incoming signal matrix of sub-arrays 1 and 2, respectively. The output noises of sub-arrays 1, 2 are denoted as N1, N2 which are assumed to be statistically independent Gaussian white noise with zero mean and variance σ 2 , and simultaneously are not related to the signal. Then the incoming signal X and Y received by the subarrays 1 and 2 are defined respectively by: It can be seen that the signal Y is obtained by simply rotating X, therefore, the combination of the two signals can be readily obtained as Here, the H-matrix contains overlapping signals. In order to evaluate the correlation of each signal, the covariance matrix is defined as Here the two sub-noises have overlapping terms, so they are not strictly equivalent to the unit variance matrix. Assuming that the angles of arrival are different from each other, i.e., H is a column full rank matrix, and the eigenvalue decomposition of R Z can be expressed as: Since the space formed by H and U S is same, there must be a unique nonsingular value matrix [T ], which satisfies then with representing the diagonal matrix of the direction of the arrival path, which can be conveniently obtained once was known based on relation = T T −1 .
Considering that both U x and U y have noises, and simultaneously MS is generally small in size and also highly sensitive to noises, the resultant error obtained by the least square method is relatively large. Mathematically, this problem can be considered as Here, since U x and U y are assumed to be noises with the same distribution. Then, the above problem can be further converted into an optimization problem, that is, The above problem can get the optimal solution of through the following minimization problem. Let U = [U x |U y ], the above problem is equivalent to find a chief matrix F, which satisfies: It is noted that as long as F is orthogonal to both U x and U x , F can be derived from the characteristic decomposition of U H U based on the following relation, Here, is sorted in descending order of feature value. F should consider the following noise subspace E N , the corresponding eigenvalues are 0. Let and can be readily obtained once getting , i.e., the diagonal matrix of arrival angle of the main signal path is available now. More importantly, the influence of noises on the signal space is fully considered in (19). Thus, the accuracy and reliability of the estimated angle of arrival are completely guaranteed. Now that the MS knows the angles of arrival of multiple communication paths, the next step is to estimate the transmission direction and the complex gain of each path of BS. In the previous BS design pilot signal design, the signals with different characteristics are transmitted in different directions. This feature can be further employed to achieve the goal.
The channel vector of MS is defined by Algorithm 1 Estimation of Channel at MS Required: Subarray signal X and Y and the final signal can be written as: In order to distinguish the path information corresponding to different angle of arrival signals, we can design Then design W H = [H ] l , l = 1, 2 . . . L Where [H ] l is that the l th row of the matrix H is unchanged, and the other rows are set to zero. the final signal received can be written as: Because of |H l a(φ l )| = 1 and |H l a(φ i )| = 0 for all i = l, where H l is the l th row of matrix H. The direction information and gain information of the corresponding path can be obtained. The complex gain and direction of a path can be known each time. After L times calculation, the complex gain and direction of L communication paths are obtained. The simulation of the proposed algorithm in part 6 also proves that this method can effectively extract multi-path information. The overall process of the channel estimation algorithm is shown in Algorithm 1. So far, the MS has calculated the channel information of the communication. MS can then design its own precoding based on above information.

IV. HYBRID BEAMFORMING DESIGN AT MS
So far, the communication path information is known to the MS. Then, hybrid precoding for the MS should be facilitated. To reach this, a precoding scheme that maximizes (9) needs to be established. For the hybrid precoding, the whole process can be divided into two stages. Firstly, design the RF combiner under the assumption of the best digital combination, and then find the best digital code suitable for the RF combiner [16]. Therefore, the RF coding problem at the MS can be written as: We need to design W RF to maximize the (30). A more common practice is to expand and analyze each element of W RF . We will analyze W RF in columns, and take the q th column w q as an example to expand the (30): where and W q as the sub-matrix of W with W q removed. Define (31) is a function of the q th column of the W matrix, where D is the Hermite matrix, that is D H = D. defines the term of (31) related to w q as g(w q ) = w H q Dw q , and we can find that each element of W can be written as: where w q (i) is the i th element of the q th column of the matrix W and R is the irrelevant part. We can conclude that the function achieves the maximum value when where the operator I [] means to unitize the plural. The elements of W RF need to be solved iteratively. After getting the radio frequency code, let's design the digital precoding. The role of MS digital coding is to make the received signal x as close to the transmitted signal s as possible. Therefore, the design of the digital combiner can be seen as the following problem: where p is the average power of the signal. In this section, the design of MS precoding is introduced. We decouple the MS precoding problem into analog combinator design and digital combinator design. In fact, this decoupling scheme is reasonable, because W H RF W RF ≈ MI , the effective noise going through the RF combinator can be considered as a colorless matrix σ 2 MI . Under this assumption, the mutual information between the receiving signal and the signal before digital processing is approximately equal to the mutual information between the receiving signal and the final signal. In the first stage, the function of W RF elements are expanded to obtain the iterative algorithm, and then the minimum mean square error (MMSE) criterion is used to design W D . The whole process of MS precoding algorithm is shown in algorithm 2.

Algorithm 2 Hybrid Beamforming Design at MS
Required: Channel H The design of W RF Initialize W RF to a complex matrix which satisfy |W RF (i,j)| 2 = 1 ∀i, j For n = 1 : N (Iteration) do: For q = 1 : Column(W RF ) do:

V. HYBRID BEAMFORMING DESIGN AT BS
After the MS hybrid beam design in the previous section, this section considers the BS precoding design. The MS sends the channel information to the BS. Note that the direction information that the BS knows is not accurate information. It requires the MS and the BS to communicate multiple times, as mentioned in the pilot part. Refer to the MS precoding design ideas, the BS design problems can be transformed into the following problems: where P is the total power of BS transmission. Refer to the MS precoding design, the BS can also design the RF combiner in multiple iterations, but this does not work in 5G communications [17]. The huge communication capacity and low latency of 5G require BS to respond quickly, but the traditional iterative calculation is large. Even if calculations can be simplified in many places, in the case of 5G ultra-multiple access, the computing power required by BS is extremely amazing, which is not conducive to the large-scale popularization of 5G equipment. Another reason is that the millimeter wave signal has a large scattering effect, and the noise interference on different channels in different environments varies greatly [15]. However, the traditional algorithm assumes that the interference of different channels is equally distributed. Deep learning networks based on memory can address this dilemma. The neural network can train the network with a large amount of data in the past, so that the network can remember the channel characteristics of different environments, and at the same time, it can learn by itself based on the feedback data [18]. In fact, some recent experimental results show that DL can learn data features based on a large amount of past data, and has played a huge role in solving many traditional problems.
Nowadays, there are many researches on channel deep learning [19], [20]. There are studies that directly regard channels as black boxes [21], and there are also studies that combine traditional theories with deep learning. This section focuses on deep learning to simplify traditional algorithm design. Based on traditional excellent algorithm ideas, deep learning is used to greatly simplify BS calculations.
The first problem in applying deep learning to BS is that because the analog beamformer has a specific architecture that includes an analog phase shifter, which must meet the unit mode constraint, so we cannot directly define it as the output of a neural network. Here we can consider using a deep learning network to output the RF combiner V RF . Refer to Fig.2 for the communication flow. Neural network replaces complex traditional RF iterative algorithms. Two neural networks, one for training and one for output V RF . The training network can be deployed in the base station or in the remote computing power center, and update the parameters to the neural network at the base station at regular intervals. VOLUME 9, 2021 Each base station has its own set of parameters, so the base station will become smarter and smarter.
Because of the constant modulus constraints of the RF combiner, a defined Lambda layer is added behind the neural network. Specifically, define the output of the last dense layer as angle θ , and output V RF in the lambda layer is: It can be seen from the above formula that θ has a clear physical meaning, and each element of θ corresponds to the phase of each simulated BF coefficient in V RF . Compared with another method where the real and imaginary parts of a complex value are first generated and then normalized on the unit circle, this method directly generates the phase component, thereby automatically ensuring that there are fewer neurons.
The purpose of deep communication is to improve communication efficiency, so the loss function of the neural network model is: where S is the size of the training set. The loss function adopts the mean square error function, V (θ R ) is the predicted value, and V per is the convergence value of the fourth section iterative algorithm. The loss function here does not adopt the principle of maximum information entropy. There are two reasons. One is that information entropy requires matrix determinant and inverse, which cannot be derived by gradient descent. The other is that this loss function is universal,replacing the real value with the results of other traditional algorithms can make full use of the previous data. On the one hand, the purpose of introducing neural network is to simplify calculation and reduce response time. On the other hand, neural network can learn the characteristics of different environments and adapt to local conditions. The detailed setup and simulation of the neural network are shown in Section VI. After the above neural network outputs the RF combiner V RF , the next step is to design the digital precoding V D . Here we need to consider the issue of signal flow power allocation first. Power allocation can be considered as: The above V D design meets |h k V RF V D k | = √ p k and |h k V RF V D i | = 0 ∀i = k. The above features ensure that different user signals do not interfere with each other.
In this section, we discuss the design of BS hybrid precoding. BS often has to process multiple connection information at the same time, and the complex traditional algorithm will greatly reduce the system response efficiency, which is not consistent with the low delay and high access required by 5G. Here we consider using a neural network to simplify the calculation, and use a large amount of previous data to train the neural network to output the BS RF combiner. Then, the water-filling algorithm calculates the user power allocation, and Finally, a digital combiner is used to isolate the data of different users. The entire BS precoding design process is shown in Algorithm 3.

VI. SIMULATION
In this section, some simulations will be presented to show the effects of the algorithms mentioned in this article, and will also be compared with some existing algorithms.
First, the channel estimation algorithm mentioned in Section 3 is simulated and verified. In the channel estimation algorithm, we mentioned that in millimeter wave communication, the signal scattering effect is serious, and the influence of noise on the signal propagation efficiency cannot be ignored. In order to eliminate the influence of input and output noise, the overall least square method is proposed. In the following simulation, we assume that the MS has 10 antennas, the signal length |s| = 1024, and the antenna spacing d = λ 2 . The noise is Gaussian white noise, with the signal-to-noise ratio is SNR ∈ [−10, 10], and the signal input direction angle is ∈ { π 6 , π 3 , π 2 , 2π 3 , 5π 6 }. In order to verify the algorithm, consider that the total signal propagation path is L = 1, 2, 3, 4, 5 respectively. Each group of communication is repeated 300 times to take the average.
The simulation result is shown in Fig.3. The above path k, which means the total transmission path of the signal is k, and the error is the average value of the k paths. It can be seen  from the figure that when the signal has only the direct path LOS, the algorithm can accurately identify the incident angle of the signal. When the incident path increases, the algorithm error is increasing. The reason here is that, on the one hand, the superposition of signals from different paths cancels out many details, and on the other hand, the matrix inversion adds many approximations during the operation. In fact, BS and MS often have to communicate several times to determine the communication channels of both parties.
In the following simulation, we verify the influence of BS neural network algorithm on communication efficiency. Throughout the simulation process, a planar array of N BS = 10 × 10 with uniform half-wave intervals was deployed at the BS, with a linear array of N MS = 10 with uniform half-wave intervals was deployed at the MS. The communication path L = 6, that is, the channel includes a direct LOS path and 5 NLOS paths. The signal transmission direction angles ∈ [0, π] and elevation angles θ ∈ [0, π 2 ] are independent and uniformly distributed, and the signal path gain is set to 1 when l = 1, and setting 10 −0.5 when l = 2, 3, 4, 5. The two latest hybrid beamforming algorithms are selected for comparison, which is based on spatial iteration The HBF algorithm [13], and the decomposition-based HBF algorithm are in [15].
In the proposed DL algorithm, the parameter settings refer to Table 1. The learning rate is initialized to 0.001 and the Adam optimizer is used. The neural network includes 4 Dense layers, 2 Normalization layers, a Flatten layer and the last lambda layer. The last Dense layer outputs the phase of V RF , and lambda converts it into a complex number. The training input is the noisy pilot channel information (PNR), and the output data is the RF combiner V RF , The mean square deviation of the output data of the neural network and the convergence data of the iterative algorithm proposed in Section 4 is used as the loss function. Fig.4 shows the relationship between the number of initial training iterations of the proposed neural network and the error rate, and adds the comparison of the point-based iteration algorithm proposed in Section 4. Both input the same 100 sets of channel data, and compare the calculation efficiency on the same computer. It can be seen from the figure that the neural network can greatly simplify the calculation. For example, when the error rate drops to 0.05, the neural network takes 4.85s, and the iterative algorithm takes 23.47s to converge. At the same time, it should be noted that the neural network can use past data for advance training. When used in real time, the neural network uses the model to output data. The model output is some simple operations that are convenient for calculations, and the time spent is much less than the time spent on training. The above analysis can prove that a suitable neural network can greatly simplify the operation of 5G equipment, while being able to utilize the characteristics of traditional excellent algorithms. Fig.5 shows the comparison between the neural network and the traditional algorithm. Three different pilot signalto-noise ratios are used to train the neural network model. It can be seen that the performance of the neural network is better than or similar to that of the traditional algorithm under a larger signal-to-noise ratio. When the pilot training signal has a poor signal-to-noise ratio, the neural network is slightly worse than the traditional algorithm, which is largely due to insufficient training data. The neural network can achieve the effect of the traditional algorithm with much lower calculation time than the traditional algorithm. In summary, it can be concluded that the proposed DL network is more robust than traditional algorithms to estimate the channel. After a large amount of data has been trained, DL has been trained to understand the characteristic channels of millimeter wave propagation, and the relationship between noisy channels and ideal RF combiners.
Considering the sparsity of the millimeter wave channel and the complexity of iterative estimation, the traditional channel model can no longer adequately express the characteristics of the channel, and the neural network can gradually learn the complex channel characteristics of the environment in the case of continuous input of data, and output The accuracy rate can continue to rise, which can give users an increasingly excellent experience. Although the neural network proposed in this article is designed in simple cases, it is universal for most complex cases. For example, consider the case of a multi-user radio frequency chain, expand the output dimension from single-user N MS × N BS to multi-user KN MS × N BS while changing the loss function. Similarly, in actual use, the loss function can be changed to a function of the signal gain and transmission efficiency returned by the MS, so that the neural network can learn the communication environment characteristics of the BS.

VII. CONCLUSION
This article discusses the architecture design of hybrid beamforming in millimeter wave multi-user communication systems. For the problem of millimeter wave multi-scattering and multi-communication paths, a high-precision multi-path channel estimation algorithm is proposed, and then the MS hybrid precoding design is introduced. For 5G ultra-high user access and low latency, neural network is used to simplify the design of BS radio frequency combiner, and the design of digital combiner takes into account the multi-user parallel communication channel model. In the sixth section, two simulations show the effectiveness of the proposed algorithm.