Robust DoA Estimation Using Denoising Autoencoder and Deep Neural Networks

As one of the most critical technology in array signal processing, direction of arrival (DoA) estimation has received a great deal of attention in many areas. Traditional methods perform well when the signal-to-noise ratio (SNR) is high and the receiving array is perfect, which are quite different from the situation in some real applications (e.g., the marine communication scenario). To get satisfying performance of DoA estimation when SNR is low and the array is inaccurate (mutual coupling exist), this paper introduces a scheme consisting of denoising autoencoder (DAE) and deep neural networks (DNN), referred to as DAE-DNN scheme. DAE is used to reconstruct a clean “repaired” input from its corrupted version to increase the robustness, and then divide the input into multiple parts in different sub-areas. DNN is used to learn the mapping between the received signals and the refined grids of angle in each sub-areas, then the outputs of each sub-areas are concatenated to perform the final DoA estimation. By simulations in different SNR regimes, we study the performance of DAE-DNN in terms of the different snapshots, batch size, learning rate, and epoch. Our results demonstrate that the proposed DAE-DNN scheme outperforms traditional methods in accuracy and robustness.


I. INTRODUCTION
D IRECTION of arrival (DoA) estimation is longstanding yet still one of the most important problems in array signal processing. DoA estimation has a variety of applications such as mobile communications [1] [2], vehicular communications [3], airborne radar recognition [4], source localization [5], unmanned aerial vehicle [6], and sonar navigation [7], to name just a few. Recently, as the demand for high-quality DoA estimation increases, the high accuracy DoA estimation technique has received much attention. To meet the demand for both civil and military use in target detection, various DoA estimation techniques have been proposed [8]- [11]. Roughly speaking, these approaches fall into three categories. The first category is the spectral estimation technique where the DoA is estimated by the numerical search. Popular approaches in this class include Capon and Bartlett methods [8]. However, it is challenging to achieve high angular resolution using this approach since we need to compute the spatial spectrum and estimate DoA by numerical search of local maxima of the spectrum. The second category is the subspace-based technique such as MUltiple SIgnal Classification (MUSIC) [9] and Estimation Of Signal Parameter via Rotational Invariance Techniques (ESPRIT) [10], where the DoA estimation problem is described by the matrix model. Due to the dependence on array characteristics and computational burden of the eigenvalue decomposition (EVD) and singular value decomposition, this approach is not so effective in coping with the inaccuracies of receiving array. The third category is the probabilistic modelbased technique, such as the Maximum-Likelihood (ML) [11] and maximum a posterior estimation [12]. This approach is effective in handling the deterministic signal but it is in general difficult to obtain a prior information on the signal model, especially for low-elevation targets. The more prior information on the target signal model or noise distribution we have, the better performance one can achieve. Also, when the number of sources is large and the range of angles is wide (i.e., when the search grid is large), the computational complexity of ML estimation grows exponentially with the number of sources.
In recent years, various methods have been proposed to solve these problems, however, the advantages are limited, especially in low SNR regimes. Considering the mutual coupling matrix of the uniform linear array (ULA) modeled as a banded symmetric Toeplitz matrix, MUSIC-Like and ESPRIT-Like algorithms are proposed to estimate DoA [13] [14]. However, these methods lead to the array aperture loss. In [15], a new DoA estimation algorithm based on the orthogonality of a specific eigenvector has been proposed, but it needs the array response to be quiescent. In [16], a reweighted regularized sparse recovery algorithm has been proposed for the DoA estimation with the unknown mutual coupling of the ULA, assuming the signal number is known in high SNR (around 10dB). In [17], an estimation method based on the spatial spectrum with a fixed eigenvalue order has been proposed to estimate DoA under the circumstance of high SNR (above 0dB), without considering the mutual coupling and the number of sources. For these reasons mentioned, when the number of signals is unknown, the receiving array is inaccurate and the SNR is low, conventional approaches are not so effective and comprehensive, in particular for super-resolution DoA estimation.
In this paper, we propose an entirely new DoA estimation technique based on deep learning to obtain the highresolution DoA estimation in the complex environment, such as low SNR regime, an unknown number of sources, inaccurate array (mutual coupling), or a combination of these problems. For this purpose, two objectives have been identified. The first objective is to reduce the noise of noisy input signals. Literature [18] has shown that denoising autoencoders (DAEs) can achieve higher detection accuracy than the basic autoencoders. Specifically, DAE [19] can learn more robust nonlinear representation from signal against noise and fluctuation, has powerful generalization capability, and produce the state-of-the-art performance on many challenging feature learning tasks, such as wireless sensor networks [20], autonomous fault detection and analysis [21] [22] [23], unmanned aerial vehicle networks [24], biomedical science [25], etc. Motivated by these successful applications and the excellent properties of the DAE, we adopted DAE to deal with target signals with noisy data. Our goal is to provide a robust signal reconstruction in the case of perturbations or disturbances presented in the sensor data while capturing nonlinear correlations embedded in multiple signals. The other objective is to capture DoA features of the inaccurate array when the number of sources is unknown. Although machine learning has shown a great potential in DoA estimation, such as radial basis function (RBF) [26], which can reduce the computation complexity and perform well in high-resolution, yet its performance degrades rapidly when the number of sources is unknown and the array is inaccurate. Recently, as a key enabler for future wireless communications [27] [28] [29], deep neural networks (DNN) has been successfully applied to various wireless systems such as multiple-input and multiple-output (MIMO) [30], wireless scheduling [31], and active user detection [32]. In these works, DNN is used to learn a desired nonlinear function (e.g., classification and decision) through the training process. Benefited from the powerful representation of the mapping relationship between high-dimensional random vector elements, DNN has shown new ways of obtaining useful feature representation that provides better performance than those traditional feature extractors. Here, DNN is used to capture the abstractions of DoA features, obtain distributed representations and further improve the generalization of the whole system when the array is inaccurate and the number of sources is unknown.
In a nutshell, we exploit DAE to increase the robustness at lower SNR, and DNN to improve detection accuracy when the number of sources is unknown and the receiving array is inaccurate. To be specific, the proposed DoA estimation scheme in our framework, henceforth referred to as the DAE-DNN, DAE extracts useful features from the corrupted received signal and learns the more robust mapping between the received signal and each sub-areas. In each sub-areas, DNN further learns the complicated mapping between the received signals with different inter-signal angles and the refined grids of angle.
Two key ingredients of the DAE-DNN technique accomplishing this mission are 1) DAE dividing the angle range into several sub-areas and 2) DNN to get a more refined estimation of DoA in each sub-areas. In the DAE stage, we extract the main features of data from the corrupted input data by encoder and decoder. After this, DNN is applied to refine the DoA estimation result in each sub-areas. According to the universal approximation theorem, DNN processed by the deeply stacked hidden layers well approximate the desired function [33]. In our context, this means that the properly designed DAE-DNN system with multiple hidden layers can handle the whole DoA estimation process, resulting in an accurate DoA estimation.
The main contributions of this paper are as follows: 1) We propose a DoA estimation system composed of DAE algorithm and DNN framework. Rather than transferring the original covariance matrix of the received data to the frequency domain to get the input data, the DAE algorithm processes the DoA sensor array output in the time domain directly. 2) We obtain the high-resolution of DoA estimation in very noisy and reverberant environments with array inaccuracies. After the denoising of the corrupted data at the receiver, a high-resolution DoA estimation is performed by training procedures based on oneversus-all classification, which decides the existence of a signal near the refined grids of angle. 3) We provide a performance analysis of the proposed system under different parameters and cases. By simulations, we identify the optimal snapshot, batch size, learning rate, and epoch of the DAE-DNN system. We also examine the root mean square error (RMSE) of the DoA in low SNR regime for assessing the accuracy of DoA estimation. From extensive simulations empir- ical evaluations test, we demonstrate the efficiency and robustness of the proposed DAE-DNN. The rest of this paper is organized as follows. In Section II, we present a DoA model with multiple signal sources, DAE model and DNN model. In Section III, we discuss the DAE-DNN estimation system. We describe the architecture of DAE-DNN system and its corresponding training strategies in detail in Section IV. We provide the performance analysis of DAE-DNN, including snapshot, batch size, learning rate, epoch, and the comparison to traditional methods in Section V. Finally, we conclude the paper in Section VI.
Notations: throughout the paper, matrices are noted by bold capital letters, while vectors and scalers are denoted by boldface small letters and small letters, respectively. (·) * , (·) T , (·) H , and · 2 represent conjugate, transpose, conjugate transpose and l 2 norm operator of a matrix, respectively. Also, E [·], H [·] are given as the expectation and entropy operator, re [·] and im [·] represent the real and imaginary parts of a complex-valued entity, respectively.

A. DOA MODEL WITH MULTIPLE SIGNAL SOURCES
We consider a ULA with M antenna elements where the element spacing is d = λ/2 (λ is the wavelength). We assume that the plane waves impinging on the receiving array are parallel since the sources are located in the far-field areas. We also assume that there is no correlation between sources. Let θ k be the kth angle of arrival in the K-independent signal impinging on the M -element array from direction {θ 1 , θ 2 , · · · , θ K } (see Fig. 1). The medium through which the wave propagates is assumed to be homogenous, isotropic, and non-dispersive. The steering vector a (θ k ) can be expressed as a (θ k ) = 1, e −j2πd sin θ k /λ , e −j2πd·2 sin θ k /λ , · · · , e −j2πd·(M −1) sin θ k /λ T , k = 1, 2, · · · , K The array inaccuracies will cause deviations in the map-ping from θ k to a (θ k ). In this paper, a (θ k ) and its perturbed variant a (θ k , e) are assumed to be unitary vectors (i.e., a (θ k ) 2 = a (θ k , e) 2 = 1). When sampled with equally spaced interval at {t 1 , t 2 , · · · , t I } (I is the snapshot), the received signal X (t) = [x (t 1 ) , . . . , x (t I )] ∈ C M ×I can be formulated as where A (θ) ∈ C M ×K is the steering matrix of the array, S (t) ∈ C K×I is the incident signal waveforms and N (t) ∈ C M ×I is the zero-mean Gaussian noise, which are given, respectively, by Note that the elements of X (t) can be expressed as The more information will be obtained with the increasing of I. The covariance matrix R x (t) of the received signals is where R S is the covariance matrix of signals S (t), σ 2 is the variance of noise, and I is an identity matrix.

B. DENOISING AUTOENCODER AND DEEP NEURAL NETWORKS 1) Denoising Autoencoder
In contrast to the conventional autoencoder used primarily in a mid or high SNR regime [34], DAE is used to recover an input signal from a corrupted version, and we can obtain a more robust representation of the input. Two ideas explain this approach. Firstly, as a higher representation of autoencoder, DAE is stable and robust when the input is corrupted; Secondly, in the denoising phase, we expected that more useful features could be extracted from the input distribution. Formally, the initial input r is corrupted by Gaussian Noise and corresponding corrupted version is denoted as r 0 , which is generated according to a stochastic mapping r 0 ∼ q D (r 0 |r). Then the corrupted input r 0 is mapped to the hidden representation d 0 via encoder f ϑ . To get the reconstructed value d, we use the decoder g ϑ given by where w, w are weight matrices and b, b are offset vectors in parameter sets ϑ = {w, b} and ϑ = {w , b }, f (·) and g (·) are a nonlinear activation function of the encoder and the decoder. Reconstruction error is measured by the loss function L (r, d). Well-known loss functions include the squared error loss L 2 (r, d) = r − d 2 and the cross-entropy loss The framework of denoising autoencoder is shown in Fig. 2.
In the training phase of Fig. 2, the parameter sets ϑ and ϑ are trained to minimize L (r, d) over a training set, that is, to have d as close as possible to the uncorrupted input r. Then the reconstructed value d is now obtained by applying deterministic mapping f ϑ to a corrupted input r 0 instead of initial input r. Parameters are initialized at random and then optimized by stochastic gradient descent. Note that each time a training example r is presented, a different corrupted version r 0 of it is generated according to r 0 ∼ q D (r 0 |r). Therefore, the mapping forces DAE to learn a far more robust mapping instead of identity: a mapping that extracts useful features for denoising [35].

2) Deep Neural Networks
We consider the basic model of DNN, a linear input-output model given by where x and w represent the inputs and weights of the neural networks, respectively, and n is the number of input. Note that f dnn (x, ω) used in the classification depends on the sign of its corresponding value.
In the deep learning field, DNN is considered as one of the most promising generative models since it can deal with many non-convex and non-linear mappings and also learn the characteristics of data in a high-dimensional space. In the data processing of DNN, there are multiple neurons in each hidden layer and we can obtain an output with a weighted sum of these neurons operated by a nonlinear function. The activation function is used in the process of DNN to realize recognition and representation operation. Generally, we usually choose the Sigmoid function and the ReLU function in the nonlinear operation, which are defined as

III. DEEP NEURAL NETWORKS FRAMEWORK FOR DOA ESTIMATION
In this section, we present the DAE-DNN architecture. In contrast to the previous works focusing on the high SNR scenario, the proposed DAE-DNN is designed to handle the DoA estimation in the low SNR regime. Benefited from multiple types of autoencoder, DAE is robust to the noise like Gaussian Noise and Salt and Pepper Noise [35]. As shown in Fig. 3, the proposed DAE-DNN consists of three parts. The first part in the green rectangle above is the data preprocessing phase of the arrival angle θ of impinging plane waves, where the covariance matrix R rr (t) of signals is computed from the steering vector a (θ k ). The second part is the DAE phase shown in the red rectangle middle. In this phase, DAE denoises the input data and divide the whole range of θ into J spatial sub-areas. The last part in the blue rectangle below is multi-layer classifiers, and they determine which sub-area the impinging signal is located.

A. DENOISING AUTOENCODER
Main purpose of DAE is to reconstruct a clean repaired input from a corrupted version of train data. In the encoding phase, useful features in a better higher-level representation are extracted. In other words, an input data is compressed to a low dimensional vector by extracting the principal component feature of the uncorrupted version. In the decoding phases, the low dimensional vector of original data can be recovered via the decoder. This step helps to reduce interference of noise and distribution divergences of the input data.
In the preprocessing stage, due to the influence of noise, a part of the input components is erased while leaving others uncorrupted, we can obtain the corrupted version r 0 generated by stochastic mapping r 0 ∼ q D (r 0 |r). Considering this, we should focus on the corrupted components instead of all components of the input data. To achieve this, we give a different weight for the reconstructed values, β 1 for the corrupted components, and β 2 for the uncorrupted components. β 1 and β 2 are considered hyper-parameters. For the training data r and reconstructed value d, the squared loss function is where (r 0 ) is the indices of the components of r 0 that are corrupted. The cross-entropy loss is given by which is called emphasized DAE. A special case when β 1 = 1, β 2 = 0 is called full emphasis, where we only consider the prediction error of the corrupted elements. Here, we set part of the elements of the original input as zeroes randomly and leaves the remaining elements uncorrupted, resulting in a "corrupted" version r 0 of the original input r. Comparing to r, the "blank" elements of r 0 will reduce many more information of r. Benefit from DAE phase, we can try to fill the missing information by learning the mapping between r and d, and then the extracted features can reflect the features of the original input r better.
Assuming that the number of layers in encoding and decoding are all equivalent to L 1 . Setting the layer index is l 1 (0 < l 1 ≤ L 1 ), we generally have |r l1 | < |r l1−1 | and |r L1−l1 | = |r L1+l1 |, where |r l1 | is the dimension of r l1 . In the encoding phase, the hidden representation is given by where l 1 and l 1 − 1 are the layer indices, w l1,l1−1 ∈ R |rl 1 |×|rl 1 −1 | is the weight matrix from the (l 1 − 1)th layer to the l 1 th layer, and b l1 ∈ R |rl 1 |×1 is the additive bias vector in the l 1 th layer, f l1 represents the element-wise activation function in the l 1 th layer.
In the decoding process, the number of decoders j is 1 ≤ j ≤ J. The phase of decoding has the same hidden representation to (13) in each sub-areas. If we define the potential scope of the incident signals as The relationship between angle scope and subareas of DAE-DNN Figure 4 shows the relationship between the angle scope and sub-areas of DAE-DNN. If there are some signals located in the jth sub-areas, the output of the jth decoder, denoted as 2L 1 , is equivalent to the input r while the output of the other decoders is zero. Furthermore, we take the total outputs of DAE as the input of the multi-classifier in the next phase and further obtain the final estimation. For this reason, some additional requirements are needed in the DAE structure for DoA estimation, which can be described as where U [j] (·) is the function of j-th DAE task. From (15), in case of multiple signals located in different VOLUME 4, 2016 sub-areas impinge onto the array simultaneously, we should assure the additive property of DAE to classify the input data in different angle range to the corresponding different decoder outputs. Thus, we use the linear activation functions f l1 (·) given by

B. MULTILAYER CLASSIFIERS
In the phase of multilayer classifiers, a list of one-versusall classifiers is applied in the DAE-DNN system. The final estimation of DoA with parameter θ is composed of J DAE, where there is a fixed number of angle values in each subareas output. Then each angle value is used as a refined grid of angle. According to the principle of the one-versus-all method, the final output of angle values represents the probability of a signal located in its neighborhood. Furthermore, the DoA of signals from off-grid directions can be estimated by interpolation between two adjacent refined grids. From the blue rectangle below in Fig. 3, there are J parallel classifiers with the same number of sub-areas, which use the outputs of DAEs as inputs. We can obtain the DoA estimation based on the refined grids of the angle of all sub-areas, only the grid being close to the actual signal directions will generate a positive valued output while all others are zero.
The feed-forward computations of the classifiers are given by where l 2 and l 2 − 1 are the layer indices in the multiclassifiers, y l2 is the output vector in the l 2 th layer of jth classifier, with h is the weight matrix with fully connected feed-forward property from the (l 2 − 1)th layer to the l 2 th layer, p ×1 is the additive bias vector in the l 2 th layer, and g l2 represents the element-wise activation function for the input of the l 2 th layer nodes.
All the outputs of the J parallel classifiers based on the J decoder outputs are obtained as where y T j is the estimation of spatial spectrum associated with DNN input r. Note that y T j is concatenated in order from the classifiers number 1 to J. And only the values of refined grids close to the actual signal directions are positive in y j while all the others are zero.

IV. DOA ESTIMATION BASED ON DAE-DNN
Based on the framework discussed in Section III, we present the basic structure of training and testing of the proposed DAE-DNN. In Fig. 5  classifiers make a further estimation of the refined grid located by the impinging signal. Generally, DNN approach can achieve non-linear mappings and learn the data's characteristics in a high-dimensional space. In this paper, a new characteristic space is exploited to achieve DoA estimation. In order to avoid a situation where the neural networks get trapped in local minima, we use two completely separated training procedures.

A. DAE TRAINING
As results in Section II indicate that the covariance matrix R rr of receiving array has all information of direction of arrival {θ 1 , θ 2 , · · · , θ K }, which is given by . Instead of taking the covariance matrix R rr as the input of DAE directly, in this paper, we just use the upper-right matrix elements (the color area in (19)). The lower left elements are unnecessary because it is the conjugate replicas of the upper right ones, and the diagonal covariance elements are not included since this elements are associated with unknown noise variance.
Furthermore, we reformulate the complex data (19) to real data by preserving the original information as much as possible by (20) and (21), which can be used as the input of the parallel multilayer classifiers.
Constructing the training dataset r with the signal direction varying in the range of θ [0] , θ [J] , [j] (1 ≤ j ≤ J) denotes the range of spectrum grid points covered by the jth decoder. Setting the whole grids as δ 1 , δ 2 , · · · , δ V (V is the total number of spaced spectrum grids). So, in each spaced spectrum of DAE, there are V /J = V 0 grids. The The relationship between spectrum grids and decoder scope.
relationship between spectrum grids and decoder scope is shown in Fig. 6, in which If there is an impinging signal from the direction δ v , then the corresponding covariance vector r (δ v ) is generated, which causes the output of decoder is d (δ v ) as we expect while the outputs of other decoders are zero.
We denote the dataset for DAE training as and the corresponding DAE label set is ∆ ∈ R J·|r|×V , which consist of d (δ v ) (see (23) on next page).
The data-label of (Γ, ∆) is used as the input and expected output to train the DAE. No matter the signals lies in the same sub-area or not, the data-label of (Γ, ∆) contains the key information of signals. The loss function used between the actual output and the expected one is expressed as whered (δ v ) is the actual output of the DAE, d (δ v ) is the expected output when r (δ v ) is the input.
To obtain the updated weight matrices w l1,l1−1 and bias vectors b l1 , backpropagated gradient algorithm is used in the loss function.
where w l1,l1−1 and b l1 are the variables after current update, w l1,l1−1 and b l1 are the variables before update and µ 1 is the learning rate. Algorithm1 presents the operational procedure of the proposed DAE algorithm for sub-areas spatial estimation.

B. PARALLEL MULTILAYER CLASSIFIERS TRAINING
After the DAE training phase, its outputs generated from J decoders are taken as the inputs of parallel multilayer classifiers, and the spatial spectrum is estimated simultaneously in each sub-areas. Different from the input r of DAE, which contain the features of all possible directions range, the input of multilayer classifiers d [j] (j = 1, 2, · · · , J) only contain the features of its own sub-areas direction range. In other words, when compared to r, d [j] (j = 1, 2, · · · , J) have more concentrated distributions. Based on this, we further set Algorithm 1 DAE Algorithm for Sub-areas Spatial Estimation. Input: Impinging signal angles {θ 1 , θ 2 , · · · , θ K }. Output: Estimated spatial sub-areas d K . 1: Step1: Get the input r of DAE from impinging direction {θ 1 , θ 2 , · · · , θ K }. 2: The steering vector a (θ) is generated by {θ 1 , θ 2 , · · · , θ K } in (1), then the array output X (t), covariance matrix R xx (t) are obtained in (2) and (5). Also, we can get ther and r based on (20) and (21). 3: Step2: Train the DAE with data-label (Γ, ∆). 4: Generate a set of training sequences of DAE, which includes the input r and output d [j] . Also, set the learning rate and the loss rate, as well as the weight matrices and bias vectors. Furthermore, set the error threshold as τ = 10 −6 . 5: while error ≥ 10 −6 : do 6: Train the DAE based on the given sequences according to the proposed learning policy by backpropagated gradients algorithm; 7: Update the weight matrices w l1,l1−1 and bias vectors b l1 of each layer of DAE by (25) and (26). 8: end while 9: Step3: Obtain the spatial gains of filters α [j] . 10: for v = 1, 2, · · · , V do 11: Calculate the conjugate transpose matrixr (δ v ) H based on the input r (δ v ). 12: end for 13: for j = 1, 2, 3, · · · , J do 14: Calculate the complex-valuedd [j] consist of the first half of d [j] as the real part while the second half as imaginary part. 15: Obtain the spatial gains of filters α . 16: end for 17: return DAE scheme and the sub-areas spatial estimation d K corresponding of impinging signal angles {θ 1 , θ 2 , · · · , θ K } according to α the activation function as the hyperbolic tangent function and then get a refined DoA estimation, which is shown as where κ −1 is the last element of κ.
In the whole DAE-DNN estimation system, there are two training processes, DAE training and multilayer classifiers training. When the DAE training is finished, values of the weight matrices and bias vectors are fixed. Then we train the whole end-to-end DAE-DNN framework whose input and output are vector r and reconstructed spectrum y, respectively. The weight matrices and bias vectors of the VOLUME 4, 2016 classification neural networks should be trained to estimate different directions of multiple signals in different sub-areas.
To achieve this goal, we collect another training dataset with multiple signals at the same time.
As mentioned in Section II, there are K-independent signals to be detected at the receiver, among these we denote the inter signal angles as c = {c m } K−1 m (m = 1, 2, · · · , K − 1). Then we can get the input vectors r (θ, c 1 , · · · , c m ), which represents the K signals from different directions θ, θ + c 1 , · · · , θ + c m with θ [0] ≤ θ < θ [J] − c m . Meanwhile, as discussed in the previous section, the reconstructed spectrum of multilayer classifiers only has positive values on the grid adjacent to the actual signal direction. Thus, we can obtain the estimation of each signal via linear interpolation between two adjacent refined grids.
vectors p l2 are then updated as where W l2,l2−1 and p l2 are the values of variables after current update, and W l2,l2−1 and p l2 are the variables before the update, and µ 2 is the learning rate. Algorithm 2 presents the operational procedure of parallel multilayer classifiers based on DNN. Considering that the responding function of the array is perturbed by the noise and array inaccuracies, the mapping from signal direction to covariance vectors is θ e → r e (θ), where e is the error. When we use the perturbed vector r e (θ) as an input to the DAE, the associated label vector is still located in the jth sub-area, even in the environment with strong noise. After this, in the process of parallel multilayer classifiers, we take the signal from direction δ i as the input, which is embedded in the output of jth decoder. If the associated spectrum label contains a spectrum peak (one or two grids closest to δ i ), we can obtain the DoA estimate of δ i by interpolation. Therefore, whole DAE-DNN system (the DAE process together with the parallel multilayer classifiers) actually forms an inverse mapping from perturbed output r e (θ) to the input θ when the array inaccuracies and environmental noise all exist. Furthermore, this derived inverse mapping r e (θ) e → θ under noise effect also adapts to the test data and is expected to obtain accurate DoA estimation even in the low SNR regime.

V. SIMULATION AND RESULTS
In this section, we present the simulation results to demonstrate the effectiveness of the proposed method. For comparison, we test the traditional DoA detection methods in low SNR level. The simulations are implemented on TensorFlow [36], and its embedded tools are used to compute the gradients directly.

A. SIMULATION SETTINGS
We consider ULA with M elements are used in the estimation of K signals impinging from the spatial scope of θ [0] , θ [J] . The inter-element spacing of the ULA is halfwavelength, and the potential space is divided into J subareas with equal spatial scopes V 0 grids. The covariance vector r in the training dataset of both DAE and multilayer classifiers are all obtained from I snapshots. The simulation parameters of DAE-DNN system and the value of training data are shown in Table I.
On each direction grid, we compute one covariance vector by the collection of one group of snapshots. For the training by the minibatch training strategy, the dataset shuffled in each epoch with training parameters are: batch size of B and learning rate of µ 1 , and epoch 1 . We set the size of the input layer, hidden layer and output layer as o = M (M − 1) = 90, 90/2 = 45 and oJ = 90 × 9, respectively.
In the DNN training phase, we collect another dataset to train the classifiers with more than one signal when the DAE parameters are fixed after the training process. In this setting, we choose K signals. The inter-signal angle c is sampled from the dataset, which covers all scenarios of adjacent signals separated by the double width of the spatial spectrum grid. If one signal direction (denoted by θ) is sampled with an interval of 1 • from −90 • to 90 • − c, then the other signal direction is θ + c. Then, the covariance vectors are collected, which are used for training with a batch size of B and learning rate of µ 2 , and the order of the vector is shuffled during each training epochs of epoch 2 . In order to obtain a tradeoff between the expressivity power (improves with deeper networks) and undertraining risk (aggravates with more network parameters) of the classifiers, we set the size of hidden layers to L 2 − 1 = 2. In each classifier, we set the sizes of the hidden layer and output layers to 2/3 × o = 60, 4/9 × o = 40, respectively. Based on a uniform distribution between −0.1 and 0.1, we initialize all the weights and biases of the DNN randomly.
In this paper, we consider mutual coupling as the main source of the array inaccuracy, which occurs often in almost all types of arrays and leads to considerable deterioration in conventional algorithms [37]. The mathematical model in the simulations is shown as where the parameter ρ ≥ 0 is used to indicate the strength of array inaccuracies, ζ = 0.5e jπ/3 is the mutual coupling coefficient between adjacent sensors. Further, the perturbed array responding function is where I M is the M × M unitary matrix E mc is a toeplitz matrix with parameter vector e.

B. ANALYSIS ON THE PARAMETERS AND DOA ESTIMATION
In Fig. 7, we plot the RMSE performance of the DoA estimation against different SNR of the proposed scheme with different snapshot number I. Recall that the snapshot I of receiving array is a controlling parameter in DAE-DNN. We consider I = 100, 400, 500, 600, and 800 in the simulation VOLUME 4, 2016 with an ascending order. Note that dB scale of SNR is defined as SNR = 10log 10 (P signal /P noise ), where P signal and P noise are the power of signal and noise respectively. We observe that the RMSE of the DoA estimation decreases with SNR, and it becomes stable gradually until the SNR is large enough. Meanwhile, the simulation results demonstrate that the RMSE performance can be enhanced when adopting a large snapshot number. However, as the number of snapshot continues to increase, the accuracy improves substantially. This result is dedicated to the fact that a larger snapshot Additionally, Fig. 8 (a)-(c) displays that the RMSE performance against different strength of array inaccuracies parameter ρ for different SNR levels. It demonstrates that as I increases, the RMSE of DAE-DNN scheme decreases with better robustness to the array inaccuracies. However, the large snapshot will result in complex computation and long-time consumption. Thus, in this paper, we choose I = 500 to get a trade-off between them. Figure 9 exhibits the RMSE performance of the DoA estimation in different SNR levels with different batch sizes. In our simulations, we test various batch sizes (16, 32, . . . bits). We set the length of the training sequence as 16 bits initially. From Fig. 9, we see that the RMSE decreases with SNR in each batch size setting. When the SNR is higher than -8dB, the RMSE performance decrease first and then increase when the batch size varies from 16 bits to 32 bits, 64 bits, and 128 bits. More details of the RMSE performance in different SNR levels with different batch sizes are shown in Fig. 10 (a)-(d). As is clear from the figure, "Batch size=32 bits" has lower RMSE and also more stable when compared to that of other batch sizes (16 bits, 64 bits, and 128 bits). This means that in the training procedure of the DAE-DNN scheme, the small-batch size reduces the convergence rate while the largebatch size enlarges the epochs. Therefore, we choose "Batch size=32 bits" for optimizing the network between RMSE performance and the stability of the proposed DAE-DNN scheme.
In Fig. 11, we show the RMSE performance of the DoA estimation as a function of SNR of the proposed DAE-DNN system, where the learning rate is set as 0.0001, 0.0005, 0.001, 0.005, and 0.01, respectively. Here, the length of the training sequence is initialized to 32 bits. For "learning rate" is 0.0005 and 0.001, the RMSE performance of the DoA estimation decreases more significantly as SNR increases. However, the RMSE performance in the case of "learning rate = 0.0001, 0.005 and 0.01" shows a slow convergence speed in the low SNR range while performing worse and unstable as SNR increases. It can be concluded from this group of simulation results that selecting an appropriate learning rate is a significant issue for boosting the performance of the DAE-DNN scheme for DoA estimation. When "learning rate" is smaller than 0.001, it takes quite a bit of time to attain good DoA estimation. In contrast, "learning rate" being larger than 0.001 will lead to worse DoA performance improvements. Considering this tradeoff, we set the "learning rate" to be 0.001 in simulations. More details are shown in Fig. 12 (a)-(c), which we observe that "learning rate = 0.001" shows the best performance. Figure 13 provides an RMSE comparison of the DoA estimation against the SNR, for various epochs. As the SNR increases, the RMSE performance is improved, which is similar to the results in the other parameters above. In case of "Epoch=300" and "Epoch=2000", the RMSE performance are decrease between -10dB and 5dB and increase between 5dB and 10dB. This is because too small epoch and too large epoch will result in under-fitting or over-fitting, which is always different in different DNN-based systems or schemes. Furthermore, compared to the "Epoch=500" situation, the RMSE performance of "Epoch=1000" is degraded in the range of -10dB to 10dB. More details are shown in Fig. 14 (a)-(d), when SNR larger than -5dB, all of the epoch number will get a satisfied RMSE performance. This is an expected result since the DAE-DNN scheme is a promising tool for ensuring robustness in low SNR when array inaccuracies exist. Figure 15 depicts the RMSE performance against different ρ in different SNR levels (e.g., -10dB<SNR<10dB). As the ρ increases, all RMSE values fluctuate around the mean value of each SNR. When SNR > 0dB, DAE-DNN scheme will perform good achieving RMSE under 0.5. In the low SNR regime (e.g., -5dB<SNR<0dB), the DAE-DNN system maintains its robustness even when ρ increases. While in the lower SNR regime (e.g., -10dB<SNR<-5dB), the correlation associated with SNR increases, causing severe degradation of the RMSE performance. The results in Fig. 15 show that the performance of RMSE is dominated by the value of SNR rather than ρ. Furthermore, Fig. 17 shows the relationship between RMSE and SNR of DAE-DNN scheme when ρ is fixed. As the SNR increases, DAE-DNN shows better robustness performance. When SNR is as low as -5 dB, DAE-DNN remains to demonstrate small RMSE performance.
Furthermore, in order to compare the performance of the proposed scheme and the previous schemes in the complexity scenarios (low SNR or mutual coupling when the signal number is unknown), we plot the estimated RMSE against different mutual coupling and SNR levels in Fig. 16 and Fig.  17, respectively. Fig. 16 compares the performance of ES-   [15], RBF [26], DOA-AI [34] and the proposed DAE-DNN scheme when ρ = 0.5.
PRIT, CLASSICAL MUSIC, ROOT-MUSIC, I-MUSIC, and DAE-DNN when SNR=0dB, M =10, K=2, I=500 and the spacing of the sensors equals half of the wavelength. DAE-DNN exhibited the satisfying performance in the whole SNR range in terms of RMSE, while MUSIC and its derivative algorithms were best only when ρ <0.2. We should mention here again that it seems that the performance of ESPRIT, MUSIC and its derivative algorithms is dominated by the quality of array responding function. When ρ increases, these traditional methods perform worse than DAE-DNN due to the lack of receiving array information.
For performance comparison, we choose the classical method (MUSIC-like) since it has low complexity computational. Also, we choose RBF since it is a typical machine learning method. As a similar method to DAE-DNN, DOA-AI can obtain satisfying performance when SNR is high. In Fig. 17, we plot the RMSE performance of the MUSIC-like [15], RBF [26], DOA-AI [34] and DAE-DNN against different SNR levels (e.g., -10dB<SNR<10dB) when ρ =0.5. As the SNR increases, all RMSE values become stable. Considering the inaccurate array and unknown signal number in the complex scenarios, the methods of MUSIC-like and RBF all show unsatisfying DoA estimation performance in the whole SNR range. Additionally, the method in [34] can obtain a high-resolution DoA estimation in the complex scenarios and exhibit comparable performance to the proposed DAE-DNN scheme. However, it shows less robustness when SNR is lower than 0dB.

C. COMPUTATIONAL COMPLEXITY ANALYSIS
In this subsection, we focus on the computational complexity of the proposed scheme. In essence, the proposed DAE-DNN scheme mainly consists of following operations: matrix multiplications, element-wise, and convolution operations. Since the number of input units, hidden units, output units and total number of layers are n 1 , n 2 , . . ., n L , L, respectively, the computational complexity of the DAE-DNN scheme is O L l=2 n l−1 n l . To compare the proposed scheme with existing techniques, we consider the MUSIC [9] and MUSICbased schemes [15], which we assume that the ULA contains M sensors, and there exist K-independent signals. In Table  2, we compare the computational complexity of various schemes, where N is the number of iterations. Since the EVD computation is unnecessary, when compared to the MUSIC-based schemes, the DAE-DNN scheme has lower computational complexity and exhibits a similar computational complexity level to RBF [26] and DOA-AI [34]. Due to the DAE phase in the proposed scheme, the computational complexity of DAE-DNN is higher than DOA-AI [34]. However, the proposed scheme achieves enhanced robustness in the low SNR regimes and provides superior DoA estimation performance when the array is inaccurate and the signal number is unknown.

DAE-DNN O
L l=2 n l−1 n l

VI. CONCLUSION
We proposed a novel DoA estimation scheme, DAE-DNN based on denoising autoencoder and deep neural networks for multi-signal estimation in an array inaccuracies scenario. This method is different from the usual DoA estimation methods since it takes advantage of DoA sensor array output in the time domain directly. We have shown that the performance of DAE-DNN is satisfied in the whole SNR range. Another advantage of DAE-DNN is that it shows better robustness in case of array inaccuracies exist. Previous methods are dominated by the array information to find DoAs, so the effect of array inaccuracies is not considered. The performance of the estimator can be influenced by the array inaccuracy even in the high SNR case. To reduce those degradations, we apply the training data to the properly designed DAE-DNN scheme, which learns the nonlinear mapping between the data of the receiving array and the estimation. Simulations demonstrate that the proposed DAE-DNN scheme is effective in high-resolution DoA estimation. We believe that DAE-DNN represents a new way of processing multi-signal and would be able to improve DoA estimates not only for the weak signal with strong noise but also for a complex signal having multiple harmonics. The training phase in the DAE-DNN scheme will cost some time and produce a certain amount of calculation. However, with the improvement of the hardware's computing power, the calculation problem will be effectively resolved. Therefore, we also believe that there are many interesting applications of the proposed approaches, such as mmWave channel estimation and MIMO detection.