High Angular Resolution Method Based on Deep Learning for FMCW MIMO Radar

In this article, we investigate the feasibility of a high angular resolution method based on deep learning for estimating the angle of arrival (AoA) using an automotive frequency-modulated continuous-wave (FMCW) multiple-input multiple-output (MIMO) radar. The deep learning approach takes advantage of the 2-D signal structure from the range-Doppler (RD) maps to determine AoA of targets. To achieve this, we use a neural network architecture that is based on the frequency-representation module of the DeepFreq model. We call it FRNet12, since it uses 12 virtual antennas, which are derived from three transmit and four receive antennas using our FMCW MIMO radar. Furthermore, we propose a cascaded neural network system to further improve the performance of the FRNet12 model. This system consists of an extrapolation neural network (ETPNet) and FRNet18. The ETPNet extrapolates six additional samples from the 12 virtual antenna inputs, and the output is then used to train the FRNet18. The cascaded system provides an improvement of $33{\%}$ in angular resolution compared to FRNet12. Additionally, it maintains the probability of resolution (PoR) at nearly $100{\%}$ , even when two targets with different amplitudes are within the theoretical angular resolution region. The proposed method is verified using simulation and measurement data from a 77-GHz FMCW MIMO radar with three transmit and four receive antennas. The results of this research demonstrate the potential of applying deep learning to estimate the AoA in an automotive radar system, and the proposed cascaded neural network system represents a significant improvement over the FRNet12 model.


I. INTRODUCTION
W ITH the emergence of self-driving cars and advanced driver assistance system, automotive radars have become the eyes of the autonomous vehicles [1], [2].State-ofthe-art automotive radar sensors have the ability to simultaneously and precisely identify the range, velocity, azimuth, and elevation angle of multiple objects.However, a weakness of radar sensors is their angular resolution, which is determined by the number and positioning of physical antennas.Multipleinput multiple-output (MIMO) antenna systems are used to improve the virtual antenna aperture.The transmit (TX) and receive (RX) antennas from MIMO radar can be combined into virtual receivers to increase the angular resolution for detecting angle of arrival (AoA) of the targets [3], [4], [5].A MIMO radar system can be implemented using time-division multiple access (TDMA) [6], [7], [8], [9], and [10], frequency-division multiple access (FDMA) [11], [12], [13], [14], [15], or codedivision multiple access (CDMA) [16], [17].
In a MIMO radar system, the number of virtual receivers (N VRX ) corresponds to the product of the number of TX (N TX ) and RX (N RX ) antennas.In applications for detecting the AoA of the targets, a uniform linear array (ULA) is widely used.This type of array typically has virtual antennas spaced half a wavelength apart and is commonly employed in MIMO radar systems, as seen in [6], [13], [14], [18], [19], [20], [21], [22], [23].The theoretical angular resolution of MIMO radars using these ULA can be calculated as [6] where d = λ /2 is the RX virtual element spacing, λ denotes for wavelength and θ denotes for AoA of the received signal relative to the antenna array boresight direction that is perpendicular to the axis of the linear receiver antenna array.The angular resolution of a radar system refers to its ability to distinguish between two separate targets located at different angles relative to the antennas.A higher angular resolution means that the radar system can distinguish the AoA between two targets more precisely.To enhance the angular resolution, a typical approach is to increase the number of physical antennas and the size of the aperture, but this solution is costly and requires hardware changes.
There has been a long history of developments in AoA estimation techniques [24].The fast Fourier transform (FFT) is a widely used technique for estimating the spectrum and AoA of targets.However, the probability of resolution (PoR) using this method is low.Besides FFT, many algorithms have been developed, and among them, subspace-based estimation methods such as multiple signal classification (MUSIC) [25], [26] and estimation of signal parameters via rotational invariance techniques (ESPRITs) [27] are well-known for their high-resolution capabilities.Moreover, various modifications of the MUSIC and ESPRIT algorithms have been proposed to accommodate different conditions [28], [29], [30], [31], [32].However, these algorithms face challenges in real-time implementation due to their high computational complexity, and require knowledge of the number of frequencies or targets beforehand, which is not always available in automotive radar applications.Furthermore, when the signal-to-noise ratio (SNR) decreases, the performance of these algorithms deteriorates significantly.To determine the number of frequencies, information-theoretic criteria such as the Akaike information criterion (AIC) [33] or minimum description length (MDL) [34], [35] could be used, along with model order selection rules based on maximum a posteriori probability (MAP) [36].Nevertheless, these additionally required algorithms would further increase computational complexity.
Another approach in AoA estimation is the use of deep learning, a specialized branch of machine learning.In contrast to traditional neural networks that only have a few hidden layers, deep neural networks (DNNs) often have tens to hundreds of hidden layers.Researchers have recently introduced deep learning techniques to solve the problems of AoA estimation [37], [38], [39], [40], [41], [42].However, these approaches require the estimation of the spatial covariance matrix, and they have not been validated with measurement data, which makes them difficult to evaluate in realworld applications.The research in [43] highlights the recent advancements and studies in the area of deep learning for AoA estimation in the context of automotive radar over the past few years.Recently, the DeepFreq model was introduced as a deep learning approach to estimate AoA in a data-driven manner [44].The model consists of two modules: a frequencyrepresentation module and a frequency-counting module.It is an improved version of the pseudo-spectrum neural network (PSNet) model from [45] and produces a highly accurate 1-D frequency representation.DeepFreq does not rely on the spatial covariance matrix, and directly operates on the generated complex sinusoidal signal data.This eliminates the need for preprocessing and still results in state-of-the-art performance.
In this article, we propose a high angular resolution deep learning method to estimate the AoA of targets in a frequencymodulated continuous-wave (FMCW) MIMO radar system.Our method leverages the 2-D signal structure from the range-Doppler (RD) map.The neural network architecture used in this approach is based on the frequency-representation module of the DeepFreq model and is referred to as FRNet12, as it utilizes N VRX = 12 virtual antennas, derived from the combination of three transmit and four receive antennas in our FMCW MIMO radar prototype.FRNet12 offers higher PoR and lower SNR requirements than traditional techniques like FFT and MUSIC.Additionally, the method does not require the prediction of the number of frequency components, further enhancing its efficiency.To further improve the performance of FRNet12 model, we propose a cascaded neural network system includes an extrapolation neural network (ETPNet) and FRNet18.The ETPNet extrapolates six additional samples from the 12 virtual antenna inputs, and the output is then used to train the FRNet18.The cascaded neural network system increases the theoretical angular resolution for the MIMO radar system using ULA receivers to where N ETP = N VRX /2 = N TX N RX /2 = 6 represents the number of extrapolated virtual receiver channels.Compared to FRNet12, the cascaded network system consisting of ETPNet and FRNet18 not only improves angular resolution by 33%, but also maintains a nearly 100% PoR even when two targets with different amplitudes are present within the new theoretical angular resolution region.This idea was inspired by [19], but our approach does not require model order selection or an estimation of the number of targets.The potential of using deep learning for AoA estimation without estimating the spatial covariance matrix is also explored in [46] and [47].However, their approach uses a simpler single-stage feedforward neural network (FNN) architecture compared to our two-stage architecture consisting of ETPNet, which uses FNN architecture, and FRnet12, which incorporates auto-encoder and 1-D convolutional neural network (1D-CNN) architecture.While our approach is more complex and takes longer to train, it outperforms using FRnet12 alone, demonstrating the advantage of a multistage approach.In summary, this article presents two major contributions as follows.
1) A deep learning method that utilizes the 2-D signal structure from the RD map of an FMCW MIMO radar system to estimate the AoA of the targets.The neural network structure built upon the frequencyrepresentation module of the DeepFreq model, referred to as FRNet12.2) A cascaded neural network system consisting of two networks, ETPNet and FRNet18, which further improves the performance of FRNet12 by increasing the angular resolution by 33% and maintaining a nearly 100% PoR even when two targets with different amplitudes approach the new theoretical angular resolution in (2).The organization of this article is as follows.Section II presents a review of the fundamental principles of FMCW MIMO radar.In Section III, we outline the data generation process, the FRNet12 and ETPNet architecture, and the training procedure.Section IV analyzes the simulations and measurements using a 77-GHz FMCW MIMO radar system with three TX and four RX antennas.Finally, Section V provides the concluding remarks based on the results obtained.

II. BACKGROUND OF FMCW MIMO RADAR
In this section, we briefly review the basic operation of FMCW MIMO radar [14].In an FMCW radar, the signal model s T (t f ) for a transmitted chirp can be written as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where A is the amplitude, f 0 is the start frequency, B is the bandwidth, T sw is the time duration of single up-chirp, ϕ 0 is the initial phase, and t f represents the time within the upchirp, referred to as fast-time.If a target is illuminated by the radar beam, a portion of the signal energy is reflected back to the radar's receiver.The intermediate frequency (IF) signal, obtained after passing through a mixer and low-pass filter, is the result of multiplying the transmitted and received signal.The analog-to-digital converter (ADC) samples N points during T sw for each receive channel.To determine the velocity of targets, the transmitting antennas continuously transmit N p chirps in the slow-time t s .Each RX channel then receives IF signal matrix S IF ∈ R N ×N p .For a radar system with N RX receive antennas, this process forms the raw radar cube data S ∈ R N ×N p ×N RX .The range profile can be determined by taking the first FFT along the fast-time with N sampling points, while the velocity profile is estimated by taking another FFT along the slow-time with N p chirps.The result of this 2-D FFT is referred to as the RD maps X RD and where Z R and Z D is the zero padding for FFT along N and N p , respectively.Finally, the AoA of targets can be calculated by taking the third FFT along the receiving channels or the spatial domain.A MIMO radar system can be established using one of the three distinct approaches: TDMA, FDMA, or CDMA.Implementing CDMA techniques in radar systems significantly increases hardware complexity, making it an impractical solution, especially for millimeter-wave systems.TDMA is the most straightforward method of separating signals from multiple transmit antennas, using sequential activation with orthogonality in time.However, TDMA cannot utilize simultaneous transmission from all transmit antennas and requires motion compensation for dynamic scenarios [7], [8], [9], [10].DDMA, on the other hand, is one of the possible FDMA techniques which can utilize simultaneous transmission from all TX antennas.Fig. 1 illustrates the comparison between TDMA and DDMA using three TX antennas over a period of 3t p , where t p is the transmission time for N p chirps, or one burst, and t p = T sw N p .With three TX antennas, TDMA needs three bursts to estimate range, velocity, and AoA of the targets simultaneously.In contrast, DDMA only requires one burst to estimate these parameters and DDMA does not require motion compensation for moving targets [14].Therefore, in this work, we apply DDMA technique for our FMCW MIMO radar.
The application of DDMA technique and the separation method using binary mask algorithm for FMCW MIMO radar is explained in detail in [14].In brief, the DDMA FMCW radar system with N TX and N RX antennas operates as follows.After mixer, lowpass filter and ADC, each RX channel receives a signal SIF ∈ R N ×N p while N TX antennas transmit simultaneously.With N RX antennas, this creates a raw ADC radar cube data S ∈ R N ×N p ×N RX .By applying a 2-D FFT of S with a window function, the RD maps XRD ∈ C Z R ×Z D ×N RX are generated.The Hann window is applied in range and Doppler for this approach to reduce the sidelobes.The main difference between the RD maps XRD generated by DDMA FMCW radar and the RD maps X RD created by the conventional FMCW radar lies in the Doppler domain where the targets Fig. 1.Sketch of linear FMCW ramps visualizing the difference between the TDMA and DDMA approach for activating three different TX antennas.DDMA leverages the simultaneous transmission over all TX antennas, while TDMA employs a sequential activation of only one TX during the transmission time t p .Additionally, TDMA requires three bursts to estimate simultaneously range, velocity, and AoA of the targets while DDMA only needs one burst.
in XRD are shifted.As a result, the number of targets in XRD is greater than in X RD , with the ratio of M × N TX where M represents the number of targets in X RD .The 2-D cell averaging constant false alarm rate (2-D CA-CFAR) [48], [49] is then applied to determine the adaptive threshold for the binary mask algorithm [13], [14].The processed RD maps J ∈ C Z R ×Z D ×N VRX , where N VRX = N TX N RX , are produced through a separation method [14] and serve as input to the deep learning approach.The signal processing chain in this work is depicted in Fig. 2.

A. Data Preparation and Frequency Representation Neural Network Architecture
In the deep learning approach, we need to train a neural network to estimate the AoA of targets.To accomplish this, we must first model the input data from N VRX RD maps J in the spatial domain.This model is then used to generate the input data for training the neural network.When there are M unknown point-targets at the same RD bin, the vector x from N VRX virtual receivers can be modeled by its entries where α m , f m , and φ m are the unknown amplitude, spatial frequency, and initial phase of the mth complex sinusoid, z[n] is complex white Gaussian noise, and n = {0, 1, . . ., N VRX − 1}.
Our FMCW radar prototype features three TX and four RX antennas.To generate the data for training and testing our neural network, we created 10 5 samples x from (4) while varying α m , f m , φ m , and z[n].The number of virtual receivers used in this process was N VRX = 12.We used 9 • 10 4 samples for training and 10 4 samples for testing the neural network.The procedure for generating these data samples is described as follows.
1) Since M is unknown, the value of M is chosen uniformly at random between 1 and 5.The performance of the neural network is influenced by both M and the length of N VRX .In cases where N VRX > 12, M can be increased.However, for our approach with N VRX = 12, a value of Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.M = 5 was found to provide better training results compared to values of M > 5.In practical radar applications, the probability of having five targets with both the same range and velocity in the RD map is low, whereas it is more probable to have multiple targets with the same range but different velocities.Typically, the case of two or three targets having both the same range and same velocity may occur.Therefore, in the AoA estimation with N VRX = 12, a value of M = 5 is considered a suitable choice for training the neural network.
2) The frequencies f m in (4) play an important role in AoA estimation, as they indicate the angle of the targets.f m are randomly generated, but with a minimum separation determined by (1).With N VRX = 12, the minimum angular separation between targets is around 9.55 • .If the training process involves a minimum angular separation between targets that is less than 9.55 • , it would result in a decrease in performance [44].
3) The phases φ m are chosen uniformly random from 0 to 2π. 4) The SNR is defined as (α 2 m /2σ 2 ) [50].To ensure a wide range of SNR values, the amplitudes α m are chosen accordingly with the variance σ 2 from the complex Gaussian noise z[n] such that the SNR varies from 0 to 40 dB.5) The input power needs to be normalized between 0 and 1 in order to improve the efficiency of the training process.6) The output y of the neural network is a pseudo-spectrum or frequency representation [44], [45].y is the superposition of narrow Gaussian pulses where the width of the pulses is determined by the variance σ 2 g and the mean is represented by f m .This can be expressed as where and K is the length of y. σ 2 g is also a hyperparameter for training the neural network.
The neural network architecture employed in this approach is based on the frequency-representation module of the Deep-Freq model as described in [44].It consists of an encoder, a series of 1-D CNN, and a decoder.A similar structure can be found in previous works, such as [51], [52], [53].However, the hyperparameters used in this approach differ from those used in the frequency-representation module of the DeepFreq model, as the latter uses an input of 50 complex samples, while this approach uses input from 12 virtual antennas.The neural network architecture is depicted in Fig. 3.This network is referred to as FRNet12, as it utilizes N VRX = 12 virtual antennas.The input model in ( 4) is complex, so the real and imaginary parts need to be separated and concatenated.The input is then passed through an encoder, consisting of a single linear layer followed by a parallel linear transformation, mapping the input to an intermediate feature space.The linear layer has an input size of 2N VRX = 24 and an output size of 2048.The parallel linear transformation converts these 2048 outputs into 64 feature channels, each with a size of 32.These feature channels are processed by 22 1-D convolutional layers with localized filters of length 3, along with batch normalization and rectified linear units (ReLUs).Finally, a decoder using upsampling and 1-D-transposed convolution generates the pseudo-spectrum with a length of K = 512.The network's training loss was minimized using the Adam optimizer [54] with a batch size of 256 and L2 regularization, for reducing overfitting problem [55].The learning rate was initialized at 7.45•10 −4 and the loss was measured using meansquared error (mse).All these parameters are hyperparameters that were tuned to achieve the minimum test loss.
After completing the training of the FRNet12, the AoA of the targets can be estimated through the following steps.
2) The input J is generated from the separation method utilizing 2-D CA-CFAR and binary mask algorithm [14].During these process, the indices containing the target locations are determined.After obtaining F 1 , F 2 is created as a reduced version of F 1 using the information on the target location indices and F 2 ∈ C Z ×N VRX where Z represents the total number of target indices.The reduction of Z R •Z D to Z serves to minimize unnecessary calculations during application of the neural network.The neural network only processes the region in the RD map that contains the targets.3) F 2 contains complex data, it must be separated into its real and imaginary parts, which are then concatenated Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Fig. 4. Process of deep learning approach to estimate output Y which is the RA map containing the AoA of targets from input J.The blue color is complex data and the light green color is real data.
to form to form Y, which represents the range-angle (RA) map containing the AoA of the targets and Y ∈ R Z R ×K .Fig. 4 shows the process of creating output Y from input J using the deep learning approach with FRNet12.

B. Cascaded Neural Network System
The FRNet12 has the ability to enhance the PoR when the AoA of the targets approaches the theoretical angular resolution as indicated in (1), as demonstrated in Section IV.However, its performance degrades if the AoA of the targets is less than 2/N VRX .Training the FRNet12 with a minimum angle between targets set to less than 2/N VRX would result in a decrease in performance [44].To address this issue, we propose the use of a cascaded neural network system, which outperforms the FRNet12.The cascaded neural network system can improve the PoR as the AoA of the targets approaches the new theoretical angular resolution according to (2).As previously discussed in the introduction, this network consists of an ETPNet followed by the FRNet18.The ETPNet extrapolates N ETP samples from N VRX data.In this implementation, we have set N ETP = N VRX /2.Choosing N ETP > N VRX /2 would result in a decrease in the performance of the ETPNet.As the number of extrapolated samples increases, the associated error also increases, and vice versa.We found that to optimize the performance of our deep learning approach, setting N ETP = N VRX /2 achieves the optimal balance between the number of extrapolated samples and the resulting extrapolation error using ETPNet.The following steps outline the process for generating the data for training the ETPNet.The ETPNet architecture is depicted in Fig. 5.It is an FNN that utilizes linear layers with a hyperbolic tangent (Tanh) activation function.The input size is 2N VRX = 24 which is the concatenation of the real and the imaginary data.The network consists of eight hidden layers, each with 112 neurons, and the output layer has a size of 2(N VRX + N ETP ) = 36, representing the real and imaginary extrapolated data.mse is used as the loss metric, which is minimized through training using the Adam optimizer with an initial learning rate of 5 • 10 −4 .The batch size is set at 256, and L2 regularization is applied to mitigate overfitting issues.These hyperparameters were selected through a tuning process to achieve the minimum test loss.After finishing the training process, the weights are saved for training the FRNet18 model.
The data generation process for training the FRNet18 is the same as that for training the FRNet12, with the exception of the frequency f m specified in (4).The frequency f m are randomly generated such that the minimum separation between them follows the constraint specified in (2), which is 6.36 • when N VRX = 12 and N ETP = 6.In addition, the encoder of the FRNet18 has a slight modification compared to the structure of the FRNet12.Since the input of the FRNet18 is the output of the ETPNet, the input size for the single linear layer in the encoder is 2(N VRX + N ETP ) = 36, rather than 2N VRX = 24 as in the case of the FRNet12.To train the FRNet18, the training data is fed into the ETPNet, which has been previously trained, and the output of the ETPNet is then used to train the FRNet18.After finishing the training for FRNet18, the AoA of the targets is estimated in a similar manner as depicted in Fig. 4, with the only difference being that the FRNet12 is replaced by a cascaded neural network system.The signal F 3 is fed into the ETPNet, and the output of the ETPNet is then passed on to the FRNet18 and subsequent layers in the neural network architecture.

A. Simulations
In this section, we present four different types of simulations to compare the performance of conventional digital beamforming using FFT, MUSIC, FRNet12, and a cascaded neural network system using ETPNet and FRNet18.We also compare the performance of the ETPNet with another extrapolation method using least squared (LS).The LS method can predict the future sample of x[n] from (4) by the following equation: where a[ p] are the linear prediction coefficients, L is the model order, , M is the estimated number of frequency component or number of point-targets M in (4).M can be estimated using AIC or MDL or MAP.In this work, we select the MAP criterion as it demonstrated better performance than AIC and MDL [36] for sinusoidal signals.
To estimate a[ p], we need to form a matrix X as . By using (6), we can extrapolate N ETP = N VRX /2 future values for x.The LS method can enhance the performance of AoA estimation compared to conventional FFT as demonstrated in [19].To compare the performance of ETPNet and LS + MAP, we consider the following combinations: ETPNet + FFT, LS + MAP + FFT, ETPNet + FRNet18, and LS + MAP + FRNet18.

Then we form a vector
In the first simulation, we simulate the PoR and root-meansquare error (RMSE) of two targets with identical amplitudes as a function of SNR and the difference between the AoA of the two targets, denoted as θ = |θ 1 − θ 0 |.The phases of the two simulated targets are random and uniformly distributed, while the noise is modeled as white Gaussian noise.The SNR values range from 0 to 40 dB with a step size of 1 dB, while θ varies from 0 • to 17.5 • , with a step size of 0.5 • .Two estimated AoA θ1 and θ0 are considered detected if they lie within a tolerance band of ±|θ 1 −θ 0 |/2 around the true values, and if the estimated amplitudes fall within a range of ±50% from the actual value.Each SNR and θ point is simulated Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.(8) where N D is the total number of detections for two targets in N tr = 1000 trials.The RMSE is calculated by the following equation: where θm,n tr is the estimated AoA in each trial and θ m is the actual AoA.They are calculated in degrees ( • ) and with M = 2 in this simulation.Fig. 6(a) and (b) shows the contour plots for the PoR and the RMSE simulation results using seven different approaches, respectively.The upper right side of the curves in Fig. 6(a) represents the PoR region with values greater than 95%.The upper right side of the curves in Fig. 6(b) represents the RMSE region with values smaller than 1.5 • .Table I compares the SNR requirements of seven different approaches to achieve PoR ≥ 95% and RMSE ≤ 1.5 • at the theoretical angular resolution θ = 9.55 • in (1) and θ = 6.36 • in (2).FFT has the worst performance among the approaches, while LS + MAP + FFT approach is better than FFT but worse than ETPNet + FFT.This demonstrates that ETPNet is superior to LS + MAP.LS + MAP needs to estimate the number of frequency components or the number of targets M to work properly.Since the performance of order selection rules using MAP depends on SNR [36], the performance of LS + MAP is degraded as SNR decreased.MUSIC approach also requires prior knowledge for proper operation and can use an AIC, MDL, or MAP to estimate the number of frequency components in applications.A significant decline in the performance of MUSIC is observed with a decrease in SNR.The deep learning approaches, utilizing FRNet12 and ETPNet + FRNet18, require 10 dB less SNR to achieve PoR ≥ 95% and 7 dB less SNR to achieve RMSE ≤ 1.5 • compared to MUSIC at θ = 9.55 • .However, FRNet12 cannot achieve PoR ≥ 95% and RMSE ≤ 1.5 • at θ = 6.36 • .Compared to MUSIC at θ = 6.36 • , ETPNet + FRNet18 requires 15 dB less SNR for PoR ≥ 95% and 9.5 dB less SNR for RMSE ≤ 1.5 • .The deep learning approaches using FRNet12 and ETPNet + FRNet18 do not require prior knowledge.The curves from LS + MAP + FRNet18 and ETPNet + FRNet18 once again confirm that the performance of ETPNet is better than LS + MAP.
In the second simulation, we again evaluate the PoR and RMSE of the two targets as a function of SNR and the difference θ between the AoA of the two targets.One target is fixed with SNR at 20 dB, while the other target's SNR varies from 0 to 40 dB.The PoR and RMSE are calculated as described in the first simulation and each SNR and θ point is simulated with 1000 trials.Fig. 7(a) and (b) depicts the With the third simulation, we demonstrate the performance of all approaches using two metrics: false-negative rate (FNR) and RMSE as functions of SNR.The SNR is varied from 0 to 40 dB with a step size of 1 dB.For each SNR point, 1000 trials are conducted with each target having the same SNR value.In each trial, the number of targets M is randomly chosen from 1 to 5, and AoA of the targets are also chosen randomly, with a minimum separation between the AoAs determined by (2), which is θ = 6.36 • .A false-negative detection occurs when the estimated AoA of a target falls outside the range of its true AoA θ ± | θ/2|.The RMSE is again calculated using (9).Fig. 8(a) and (b) shows that the performance of all approaches depends on the SNR, with the FNR and RMSE decreasing as the SNR increases.When the SNR ≥ 35 dB, the performance of LS + MAP + FFT is comparable to ETP-Net, however, for SNR < 35 dB, ETPNet + FFT demonstrates better performance.Among the approaches, FRNet12 outperforms FFT, LS + MAP + FFT, ETPNet + FFT, and MUSIC.At SNR ≥ 15 dB, LS + MAP + FRNet18 is better than FRNet12.At high SNR, LS + MAP + FRNet18 approaches ETPNet + FRNet18, but at low SNR, ETPNet + FRNet18 shows superior performance.This further confirms that ETPNet is better than LS + MAP.
In the fourth simulation, we again evaluate the performance of the proposed approaches by measuring the FNR and the RMSE.This simulation involves 1000 trials, where in each trial, the number of targets M is randomly selected between Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply. 1 and 5 and each target is assigned with a random SNR from 0 to 40 dB, as well as an arbitrary AoA of the targets.The minimum separation between the AoA is determined by (2) which is θ = 6.36 • .The FNR and RMSE are calculated in the same manner as in the third simulation.The results of all possible approaches are summarized in Table III.It is evident that the ETPNet + FRNet18 approach yields the best results with the lowest FNR and RMSE compared to other approaches.
It is important to note that in the third and fourth simulations, we assume that up to five targets randomly appear in the simulation scenario, with the same range and velocity but different AoA.In this case, only one target is visible due to the overlap of these targets in an RD map.However, when one to five or more targets randomly appear with differences in range and velocity, the targets appear as separate objects in the RD map.Deep learning approaches can easily resolve their AoAs, as it is similar to determining the AoA for a single target when no overlapped target is present in the RD map.If more than five targets appear with the same range and velocity but different AoA in an RD map, the performance of the deep learning approach is degraded since it was trained with a maximum of five targets for N VRX = 12.To recognize more than five targets, the deep learning approach needs to be trained with N VRX > 12.For example, the frequencyrepresentation module in the DeepFreq model was trained with 50 complex inputs and can recognize up to ten targets.The purpose of the third and fourth simulations is to evaluate the performance of each approach compared to each other, using a reasonable metric.

B. Measurements
This section presents seven different measurements to compare the performance between conventional digital beamforming using FFT, LS + MAP + FFT, ETPNet + FFT, MUSIC, FRNet12, LS + MAP + FRNet18, and cascaded neural network system with ETPNet + FRNet18 using data from an FMCW radar with three TX and four RX antennas operating Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.The results demonstrate that FFT, LS + MAP + FFT, ETPNet + FFT, and MUSIC methods fail to separate the two targets at the same range, as they merge into a single peak.In contrast, FRNet12, LS + MAP + FRNet18, and ETPNet + FRNet18 can separate and identify the AoA of the targets.However, the performance of LS + MAP + FRNet18 and ETPNet + FRNet18 is better than that of FRNet12.at 77 GHz in a real-world scenario.For all the measurements, the radar transmits a burst with N p = 32 fast chirps.Each chirp has a duration of T sw = 32 µs with a bandwidth of B = 1 GHz and sweeps between 76.5 and 77.5 GHz.The ADC sampling frequency is 16 MHz and the number of IF samples is N = 512.The FMCW radar uses DDMA technique with three TX antennas transmitting simultaneously and the signal processing follows the chain shown in Fig. 2 [14].It should be noted that the training for the deep learning approaches completely based on simulated data assuming point-targets and it is independent on the measurement.
The first measurement involved three static targets: two metal poles and a corner reflector with a radar cross-section (RCS) of 10.5 dBsm, placed in an anechoic chamber as shown in Fig. 9(a).The two metal poles were positioned at the same range of 2.7 m, with a distance of 38 cm between them.The left pole and the right pole were placed at angles of 0 • and 8 • , respectively, while the corner reflector was placed at a range of 3.4 m and an angle of 16 • for reference.Using the RA maps obtained with FFT, LS + MAP + FFT, ETPNet + FFT, and MUSIC methods, the two targets at the same range were merged into a single peak, making it impossible to separate them, as shown in Fig. 9(b)-(e).However, using FRNet12, LS + MAP + FRNet18, and ETPNet + FRNet18, which were illustrated in Fig. 9(f)-(h), the two targets could be identified and separated.FRNet12 was capable of separating targets with an angular separation θ ≈ 9.5 • since FRNet12 was trained with the theoretical angular resolution followed by (1).With θ < 9.5 • , the performance of FRNet12 decreased.In the second measurement, the setup of the first measurement was modified by moving the right metal pole closer to the left, resulting in a reduced distance of 31 cm between the two targets at the same range.The left and right poles were placed at 0 • and 6.5 • , respectively, as shown in Fig. 10(a).The approaches using FFT, LS + MAP + FFT, ETPNet + FFT, and MUSIC were unable to separate the two targets, as illustrated in Fig. 10(b)-(e).Although FRNet12 could recognize the targets, it detects the difference in AoA incorrectly, with θ ≈ 11 • .The disadvantage of FRNet12 is that its performance decreases when the difference in AoA is less than the theoretical angular resolution in (1) as observed in Fig. 10(f).LS + MAP + FRNet18 and ETPNet + FRNet18 could separate the two targets with θ ≈ 6.5 • , even when the right metal pole is moved closer to the left, as shown in Fig. 10(g) and (h).
The third measurement involved three static targets positioned at the same range of 2.9 m using three metal poles, with a distance of 49 cm between each pole and placed at angles of −2 • , 7.6 • , and 17.6 • .This could be seen in Fig. 11(a).The FFT, LS + MAP + FFT, ETPNet + FFT, and MUSIC methods were unsuccessful in separating the three targets, merging them into a single peak in the RA map.The deep learning approaches using FRNet12, LS + MAP + FRNet18, and ETPNet + FRNet18, as shown in Fig. 11   The fifth and sixth measurements were based on the first and second measurements, with two targets at the same range.However, in the fifth and sixth measurements, two targets with different RCS were investigated.In the fifth measurement, a corner reflector with an RCS of 10.5 dBsm and a metal pole was positioned at the same range of 2.9 m, with a distance of 51 cm between them, and was placed at angles of −10 • and 0 • .Another target for reference is put at 4 m at an angle of −5 • .This is depicted in Fig. 12(a).The approaches  using FFT, LS + MAP + FFT, ETPNet + FFT, and MUSIC cannot separate the two targets at the same range, as they merge into a single peak.Three approaches using FRNet12, LS + MAP + FRNet18, and ETPNet + FRNet18 could identify and separate the target with θ ≈ 10 • , as shown in Fig. 12(b)-(d).While FRNet12 could separate the target with the correct AoA, the difference in magnitude is hardly recognized.LS + MAP + FRNet18 and ETPNet + FRNet18 can distinguish the magnitude difference.It should be noted that while the FRNet12 and ETPNet + FRNet18 were not specifically trained for amplitude recognition, the extrapolation technique used in ETPNet does improve the accuracy of the amplitude.However, it should be acknowledged that the amplitude estimation from ETPNet + FRNet18 was not completely accurate.
In the sixth measurement, the setup from the fifth measurement was altered by shifting the right metal pole closer to the left pole, causing the distance between the two targets at the same range to decrease to 35 cm.The corner reflector and the right poles were positioned at −10 • and −3 • , respectively, as depicted in Fig. 12(e).FRNet12 is able to recognize the targets, it misidentified the difference in AoA with θ ≈ 10 • Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
since FRNet12 was trained with θ ≈ 9.5 • .This is shown in Fig. 12(f).LS + MAP + FRNet18 is unable to identify the metal pole in the RA map under these conditions, as shown in Fig. 12(g).However, even with the right metal pole moved closer to the left, ETPNet + FRNet18 is capable of distinguishing the two targets with θ ≈ 7 • , as demonstrated in Fig. 12(h).Moreover, ETPNet + FRNet18 is able to differentiate the magnitude difference though it may not be completely precise.
The seventh measurement was performed with two static targets using two corner reflectors with an RCS of 10.5 dBsm in an outdoor environment with more clutter than an anechoic chamber.The two corner reflectors were placed at the same range of 14 m, with a distance of 33 cm for each.Using this setup, the estimated angular separation was θ ≈ arctan(0.33/14)≈ 1.35 • .This setup is shown in Fig. 13(a).Fig. 13(b)-(e) depicts the RA map results using FFT, LS + MAP + FFT, ETPNet + FFT, and MUSIC, respectively.None of these approaches could separate the two targets at the same range.Fig. 13(f)-(h) shows the RA map result using FRNet12, LS + MAP + FRNet18, and ETPNet + FRNet18, which could separate the two targets at the same range.However, since the estimated angular separation was small θ ≈ 1.35 • and less than the theoretical angular resolution in (1) and ( 2).The deep learning approaches showed the best estimation result based on the training model.FRNet12 provided the AoA result of 11 • while LS + MAP + FRNet18 and ETPNet + FRNet18 yielded the AoA result of 6.5 • .This measurement demonstrated the ability to separate two targets at the same range when the targets were far away from the radar and their AoA was less than the theoretical angular resolution.
In a measurement scenarios where there are more than five targets at the same range and velocity, the performance of deep learning approaches decreases.FRNet12 and ETPNet + FRNet18 can resolve three or four targets accurately, but they are unable to estimate all of the targets in such scenarios.To enable deep learning approaches to identify more than five targets at the same range and velocity, the radar system requires additional virtual receivers (N VRX > 12), and the deep learning approaches need to be retrained.However, it is worth noting that if there are more than five targets at different ranges and velocities in the RD map, FRNet12 and ETPNet + FRNet18 can easily resolve them.This is because the process is similar to determining the AoA for a single target when there are no overlapping targets in the RD map.

V. CONCLUSION
In this article, we have proposed and investigated the use of deep learning methods for AoA estimation in an FMCW MIMO radar to enhance the PoR and angular resolution.Our proposed methods utilize the DDMA technique with three TX antennas transmitting simultaneously to generate N VRX processed RD maps, which are then fed into the trained neural network FRNet12 for AoA estimation.Our simulation and measurement results show that the deep learning approaches achieve a PoR of nearly 100% at the theoretical angular resolution in (1), outperforming conventional beamforming using FFT.Compared to MUSIC, our approaches require less SNR and do not require prior knowledge.Furthermore, we introduce an improved version of FRNet12 using the ETPNet cascaded with FRNet18, which improves angular resolution by 33% while maintaining a PoR of nearly 100% when two targets approach the new theoretical angular resolution in (2).Our proposed methods demonstrate the significant potential of deep learning to enhance the accuracy and angular resolution of AoA estimation in an FMCW MIMO radar.

Fig. 2 .Fig. 3 .
Fig. 2. Overview of signal processing chain in this article.S ∈ R N ×N p ×N RX , XRD ∈ C Z R ×Z D ×N RX , J ∈ C Z R ×Z D ×N VRX .
F 3 serves as the input of the FRNet12.4) The FRNet12 generates an output Y 1 ∈ R Z ×K .5) Given the information of the target location indices, Y 2 can be reconstructed from Y 1 and Y 2 ∈ R Z R ×Z D ×K .6) Finally, the mean is taken over dimension Z D of Y 2

1 )
Generating 10 5 samples x ′ following equation (4) without adding the noise term z[n], where n = 0, 1, . . ., N VRX + N ETP − 1, N VRX = 12, and N ETP = N VRX /2 = 6.Using 9 • 10 4 of the generated samples for training and the remaining 10 4 for testing purposes.2) Adding noise z[n] to the first N VRX = 12 samples of x ′ by selecting amplitudes α m that are in accordance with the variance σ 2 of the Gaussian noise, such that the

Fig. 5 .
Fig. 5. ETPNet architecture.The input of the ETPNet consists of complex numbers, which are separated into their real and imaginary parts and then concatenated.The colors in this figure indicate the classification of the input and output data, both of which are real numbers.

Fig. 6 .
Fig. 6.Simulation results with seven different approaches for two targets with identical amplitude as a function of SNR varying from 0 to 40 dB and difference θ varying from 0 • to 17.5 • between the AoA of the two targets from the first simulation.Each SNR and θ point is simulated with 1000 trials.(a) PoR: the upper right side of the curves is the PoR region which is larger than 95%.(b) RMSE: the upper right side of the curves is the RMSE region which is smaller than 1.5 • .

Fig. 7 .
Fig. 7. Simulation results with seven different approaches for two targets as a function of SNR and difference θ between the AoA of the two targets from the second simulation.In this simulation, one target is fixed with SNR at 20 dB and another target has varying SNR from 0 to 40 dB.θ varies from 0 • to 17.5 • .Each SNR and θ point is simulated with 1000 trials.(a) PoR: the upper right side of the curve is the PoR region which is larger than 95%.(b) RMSE: the upper right side of the curve is the RMSE region which is smaller than 1.5 • .

Fig. 8 .
Fig. 8. Simulation results with seven different approaches with random number of targets as a function of SNR varying from 0 to 40 dB from the third simulation.In each SNR point, 1000 trials are performed with each target having the same SNR value.In each trial, the number of targets are random between 1 to 5 and the AoA of the targets are arbitrary with a minimum separation θ = 6.36 • .(a) FNR.(b) RMSE.

Fig. 9 .
Fig. 9. Measurement scenario and RA map results for seven different approaches.(a) Measurement scenario consists of two metal poles placed at a range of 2.7 m, with a distance of 38 cm between them.The left pole and right pole are placed at angles of 0 • and 8 • , respectively, and a corner reflector is placed at a range of 3.4 m and an angle of 16 • for reference.RA map results are shown for (b) FFT, (c) LS + MAP + FFT, (d) ETPNet + FFT, (e) MUSIC, (f) FRNet12, (g) LS + MAP + FRNet18, and (h) ETPNet + FRNet18.The results demonstrate that FFT, LS + MAP + FFT, ETPNet + FFT, and MUSIC methods fail to separate the two targets at the same range, as they merge into a single peak.In contrast, FRNet12, LS + MAP + FRNet18, and ETPNet + FRNet18 can separate and identify the AoA of the targets.However, the performance of LS + MAP + FRNet18 and ETPNet + FRNet18 is better than that of FRNet12.
(b)-(d), were able to separate the three targets.ETPNet + FRNet18 outperformed LS + MAP + FRNet18, and ETPNet performed better than LS + MAP since it did not require target estimation.The fourth measurement adjusted the third measurement by moving the two right metal poles closer to the left, with the left and middle poles separated by 38 cm and the middle and right poles separated by 40 cm.The metal poles were placed at angles of −2 • , 5.5 • , and 14 • , as shown Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Fig. 10 .
Fig. 10.Measurement scenario modified from the measurement scenario in Fig. 9(a) and RA map results for seven different approaches.(a) Measurement scenario consists of two metal poles placed at a range of 2.7 m, with a distance of 31 cm between them.The left pole and right pole are placed at angles of 0 • and 6.5 • , respectively, and a corner reflector is placed at a range of 3.4 m and an angle of 16 • for reference.RA map results are shown for (b) FFT, (c) LS + MAP + FFT, (d) ETPNet + FFT, (e) MUSIC, (f) FRNet12, (g) LS + MAP + FRNet18, and (h) ETPNet + FRNet18.The outcomes reveal that FFT, LS + MAP + FFT, ETPNet + FFT, and MUSIC methods are unable to distinguish the two targets at the same range as they combine into a single peak.On the other hand, FRNet12, LS + MAP + FRNet18, and ETPNet + FRNet18 can recognize and separate the AoA of the targets.However, the performance of LS + MAP + FRNet18 and ETPNet + FRNet18 is superior to that of FRNet12.

Fig. 11 .
Fig. 11.(a) Measurement scenario shows three metal poles at the same range of 2.9 m, separated by a distance of 49 cm and placed at angles of −2 • , 7.6 • , and 17.6 • .The methods using FFT, LS + MAP + FFT, ETPNet + FFT, and MUSIC fail to separate the three targets, resulting in the merging of the targets into a single one in the RA map.Other approaches can separate the three targets.RA map results are shown for (b) FRNet12, (c) LS + MAP + FRNet18, and (d) ETPNet + FRNet18.(e) Measurement scenario shows three metal poles at the same range of 2.9 m, with the left and middle poles separated by 38 cm and the middle and right poles separated by 40 cm, and placed at angles of −2 • , 5.5 • , and 14 • .RA map results are shown for (f) FRNet12, (g) LS + MAP + FRNet18, and (h) ETPNet + FRNet18.The results show that ETPNet + FRNet18 performs better than both FRNet12 and LS + MAP + FRNet18.

Fig. 12 .
Fig. 12.(a) Measurement scenario involves a corner reflector with RCS of 10.5 dBsm and a metal pole positioned at a distance of 51 cm from each other and a range of 2.9 m.The corner reflector is located at an angle of −10 • while the metal pole is at an angle of 0 • .Another metal pole is placed at a range of 4 m and an angle of −5 • for reference.The methods using FFT, LS + MAP + FFT, ETPNet + FFT, and MUSIC fail to separate the two targets, resulting in the merging of the targets into a single one in the RA map.Other approaches can separate the three targets.RA map results are shown for (b) FRNet12, (c) LS + MAP + FRNet18, and (d) ETPNet + FRNet18.(e) Measurement scenario shows the right metal poles is put closer to the left, resulting in a reduced distance of 35 cm between the targets.The corner reflector and the metal poles were positioned at −10 • and −3 • , respectively.RA map results are shown for (f) FRNet12, (g) LS + MAP + FRNet18, and (h) ETPNet + FRNet18.The results demonstrate that ETPNet + FRNet18 outperforms FRNet12 and LS + MAP + FRNet18 in the measurement scenario with different RCS targets at the same range.

Fig. 13 .
Fig. 13.(a) Measurement scenario in outdoor environment with two corner reflectors placing at the same range at 14 m.The distance between the two corner reflectors is 33 cm.RA map results are shown for (b) FFT.(c) LS + MAP + FFT.(d) ETPNet + FFT.(e) MUSIC.(f) FRNet12.(g) LS + MAP + FRNet18.(h) ETPNet + FRNet18.The deep learning approaches FRNet12, LS + MAP + FRNet18, and ETPNet + FRNet18 showed the best estimation result based on the training model.This measurement demonstrated capability to distinguish between two targets at the same range when the targets were far away from the radar and have an AoA below the theoretical angular resolution.

TABLE I COMPARISON
OF SEVEN DIFFERENT APPROACHES IN THE FIRST SIMULATION RESULT PRESENTED IN FIG. 6. IN THIS SIMULATION, TWO TARGETS HAVE IDENTICAL AMPLITUDE VARYING FROM 0 TO 40 dB AS A FUNCTION OF SNR AND θ .θ VARIES FROM 0 • TO 17.5 •

TABLE II COMPARISON
OF SEVEN DIFFERENT APPROACHES IN THE SECOND SIMULATION RESULT PRESENTED IN FIG. 7. IN THIS IMULATION, A TARGET IS FIXED WITH 20 dB AND ANOTHER TARGET HAS VARYING SNR FROM 0 TO 40 dB.θ VARIES FROM 0 • TO 17.5 • While LS + MAP + FFT performs better than FFT, it is still outperformed by ETPNet + FFT.None of the FFT approaches and MUSIC are able to achieve a PoR ≥ 95% and an RMSE ≤ 1.5 • for either θ = 9.55 • or θ = 6.36 • .Furthermore, FRNet12 are unable to meet the requirement of PoR ≥ 95% and RMSE ≤ 1.5 • for θ = 6.36 • .The results from LS + MAP + FRNet18 and ETPNet + FRNet18 indicate that ETPNet performs better than LS + MAP.