Simulated Annealing Assisted Sparse Array Selection Utilizing Deep Learning

This paper proposes simulated annealing (SA) assisted deep learning (DL) based sparse array selection approach. Conventional DL-based antenna selectors are primarily data-driven techniques. As a result, the required dataset is generated by listing all possible combinations of selecting M sensors given N uniform array, which is computationally expensive. A simulated annealing algorithm is proposed to assist dataset generation as an initializer to circumvent the above limitation. The SA algorithm sequentially samples and optimizes the subarrays that constitute the training data samples while retaining specific array characteristics. Hence, it simplifies the dataset annotation as most array configurations generated contain desired properties, thereby reducing the computation complexity of the overall data annotation processes. Therefore, the initializer reduces computation costs related to data generation considerably. Simulation examples show that using the dataset generated by the proposed method improves the DL-based array selector’s accuracy compared to the one generated by the conventional random sampler. Moreover, the realized sparse arrays show better sparse array configuration characteristics and enhanced DOA estimation performance.


I. INTRODUCTION
T HE design of optimum sparse arrays via machine learning (ML) for direction-of-arrival (DOA) and beamforming (BF) has recently received tremendous attention [1]- [15]. This is the case as ML-based arrays design approaches have low computation complexity compared to their combinatorial and convex optimization counterparts [6]- [9]. For instance, [10] proposed a support vector machine (SVM) in connection with an artificial neural network (ANN) to predict sparse linear array configurations for adaptive beamforming. The approach employs sensing environmental features extracted from capon-beamformer to select the array that maximizes the adaptive beamforming signal-to-interference-plus-noise ratio (SINR). Furthermore, a deep learning-based antenna selection approach was proposed in [11] to predict planar or two-dimensional (2D) sparse arrays using covariance matrix as input.
Structurally, the ML-based selectors are designed so that once the optimal sensors are selected, any DOA estimation techniques can be applied for DOA estimation. Typical methods include subspace methods, such as MUSIC [12], and compressive sensing methods, such as orthogonal matching pursuit (OMP) [13], [14]. As such, further application of [11] to compressive sensing techniques and multiple sources scenarios was presented in [11]. Furthermore, transfer learning over the source and the geometry domains were also evaluated over the exact formulation of [10] in [12]. Finally, ref [17]- [18] presents the application of the technique to linear arrays and performance analysis of the same under various DOA estimation situations.
A closer comparison of [11], [15]- [19] shows that these DL-based methods were proposed based on the benchmark work in [11]. Since the main limitation of the benchmark work in [11] is the use of enumeration and exhaustive search algorithms to generate and label the dataset, then [15]- [19] exhibit the same limitation. Typically, antenna selection problems involving 2D arrays yield combinatorial solutions of the order of 10 6 , and higher [18]. As such, enumeration of all combinations and annotating such a large solution set is costly and often limits the size of the 2D array per problem [16]. To overcome the limitations above, in [11], [15]- [16] a random sampling approach was suggested to replace the enumeration method. The technique involves random enumeration of a portion of all combinations instead of using all the possible combinations. However, the method often leads to sub-optimal solutions as the randomly sampled batches may not statistically represent the distribution of subarrays in the actual combination set [19].
Notwithstanding the limitation above, the ML-based optimization approaches are still desirable for a wide range of applications in wireless communication, aside from the design of sparse arrays. For example, the ML-based techniques are becoming instrumental in resource allocation in the unmanned aerial vehicle (UAV) [4]- [5], beam selection, and management in MIMO systems. Such is the case because once trained; the model requires less matrix multiplication time to converge to a near-optimal solution than combinatorial or convex optimization-based algorithms [6]- [9]. Moreover, the ML-based techniques are robust to uncertainties, and they can transfer features between two models or tasks [16], [20].
On the other hand, most conventional antenna selection techniques employ heuristic or population-based optimization methods. Although practical, they are computationally expensive and prone to local minima [6]. However, they have been proven effective when dealing with a small, welldefined set of problems. The commonly used methods include simulated annealing (SA) and genetic algorithm (GA) [21]. Therefore, to circumvent the use of enumeration of all possible combinations, we propose an SA-based algorithm to assists in the generation of the training dataset. Although, in specific settings, GA can outperform the SA algorithm. In this work, we opted for SA since, unlike the GA algorithm, it starts with only one solution and tries to enhance it. Hence, it simplifies the selection of instances at the outset [21]. This paper presents simulated annealing assisted DL-based antenna selection approach for 2D sparse array selection. The proposed technique is a hybrid two-stage approach for the generation of the dataset, and sparse array selection using features extracted from DOA estimation environment [19]. The first stage involves the generation of training samples where the M -element 2D subarrays associated with the antenna selection problem are sequentially and randomly sampled. Then, the sampled 2D subarrays are optimized to spread the sensors while maintaining the maximum aperture. The second stage consists of training data labeling and sparse array selection processes. The stage employs a simple search algorithm to sift through the subarrays rendered in stage one for best subarrays to designate class labels or ground truth. Then, the realized dataset is used to train a convolutional neural network (CNN) model for sparse subarray multiclassification purposes [11]. Numerical results show that the proposed method improves the DL-based antenna selector's accuracy and reduces computation costs related to training data generation and annotation. Furthermore, the results show that the rendered 2D sparse arrays have improved DOA estimation resolution compared to the parent 2D array and other 2D sparse arrays.
In general, the main contributions of this paper are summarized as follows.
i) We proposed a deep-learning-based antenna selection approach with a simulated annealing initializer. The proposed hybrid approach has improved sparse array estimation accuracy and reduced computation complexity compared to the conventional deep learning-based methods and traditional SA-based antenna optimization approach. ii) Furthermore, detailed theoretical and numerical simulation results are presented to demonstrate the superiority of the proposed antenna selection method in terms of realized antenna characteristics, DOA estimation performance, and computation complexity. The remainder of this paper is outlined as follows. In section II, we consider preliminaries of antenna selection and corresponding conventional dataset generation approach. The proposed simulated annealing-based antenna selection technique is discussed in section III. Then, numerical simulation experiments are carried out to test the advantages of the proposed antenna selection approach under various scenarios in section IV. Finally, section V concludes the paper.
Throughout the paper, we use lower-case and upper-case bold characters to denote vectors and matrices, respectively, i.e., I K represents the K × K identity matrix. Operators (·) T and (·) H stand for transpose and the conjugate transpose of a vector or matrix in that order. And, vec(·) denotes vectorization operator and diag(·) represents a diagonal matrix. Moreover, ⊙ and E · denote the Khatri-Rao product and statistical expectation operator.

II. PRELIMINARIES
In this section, we briefly introduce the conventional deep learning-based antenna selection approach.

A. PROBLEM FORMULATION
The problem of selecting a subarray with M -elements from a uniform array with N -elements yields Q possible combinations where From a machine learning perspective, (1) is considered as the number of possible classes. Assumming that H contains all the possible classes in (1), and that h ∈ H is a sparse subarray associated with sensor position set Z h =[z 1 , z 2 , z 2 , . . . , z M ]. Then, all the sparse subarrays in H can be expressed as Following (2), [11] proposed a convolutional neural network (CNN) model to classify the best sparse subarray configuration that offers better DOA estimation performance. Since CNN is a data-driven approach, the core component of the method in [10] is the well-labeled dataset. Therefore, per [11] the annotated dataset can be generated from H using a two-step process. Firstly, a set of target DOAs was sampled and generate received signal realizations for all h ∈ H. Secondly, using a performance metric, the subarray configurations with the best performance values are designated as labels [11], [15]- [13]. We review the procedures above in the subsequent sections.

B. REALIZATION OF TRAINING DATASET
In this section, we briefly review the procedures for generating and annotating the training dataset. In general, the process involves the following key steps:

1) Realization of subarray configurations
The first step involves realization of sample subarray configurations according to (1) and (2).

2) Realization of received signals
Next, the received signals for each sample subarray realized in step I are calculated. Therefore, suppose that K uncorrelated narrowband sources from directions Θ 1 , Θ 2 , . . . , Θ K are impinging on h−th subarray with M -elements where Θ k = (θ k , ϕ k ) such that θ k and ϕ k are k−th elevation and azimuth angles for k = 1, 2, . . . , K. Then, the received signal at h−th subarray can be defined as where A h (Θ), s h (t) and n h (t) denote the h−th subarray manifold, received signal vector and noise vector respectively. Assuming that the source and the noise vectors are statistically independent and uncorrelated [17]- [19], the corresponding covariance matrix can be expressed as where R s = diag(σ 2 1 , σ 2 2 , . . . , σ 2 K ), σ 2 k and σ 2 n are signal and noise powers respectively.

3) Computation of Performance Metric Values
Here, the performance of each subarray is determined using a specific performance metric. Like [11], this work assumes CRB as a performance metric. Therefore, from (4), the partial derivative of A h (Θ) with respect to θ and ϕ can be expressed as respectively. As such, the CRB of the h−th subarray with respect to θ and ϕ can be defined as where respectively [10], [17]. Hence, the CRB of the h−th subarray can be expressed as assuming that the signal-to-noise ratio (SNR) is defined as 10 log 10 (σ 2 s /σ 2 n ). Thus, using (7) one can determine the CRB of DOA estimation of any h ∈ H given information related to the DOA of a signal [19].

4) Selection of the best subarrays (or labels)
Lastly, we comb the calculated CRB values and select the subarrays with the best performance metric value. Therefore, through a simple search method, the best subarrays which minimize (7) are singled out as labels. Thus, a set U is constructed which consists of subarrays that minimize the following problem for u = 1, 2, 3, . . . , |U|. As observed in [11], [19], the set U is much smaller than H due to the similarities in array configurations and responses to various DOAs.
ing the phase, real and imaginary components of a sample covariance matrixR [10]. Hence, the input-output data pairs are computed as (H, u) where u ∈ U is the output label denoting the best subarray sensor indices givenR as input [11], [19].
As mentioned in the introductory section, the enumeration of H coupled with optimization task in (8) is computationally expensive [11] bearing the fact that |U| ≪ |H|. Therefore, in the subsequent section, we introduce an alternative approach to realizing sample subarray configurations and the labels. This approach requires less expensive computation costs than the use of H.

III. PROPOSED SIMULATED ANNEALING BASED TRAINING DATA GENERATION APPROACH
In this section, we introduce a simulated annealing-based algorithm as an initialization step to the DL model. The primary purpose of the algorithm is to generate the training dataset instead of enumeration of the whole H set as in (1) or a partition of H.

A. PROPOSED SA-BASED APPROACH
Simulated annealing has been used countless times to design optimum arrays [21]. The approach was used in [19] to optimize sensor positions of the hourglass array while minimizing the mutual coupling between the sensors. Thus, VOLUME 4, 2016 given an initial array as Z init , the optimized array Z SA can be obtained by minimizing where m i , m j ∈ Z init , M is the number of sensors, || · || 2 is the l 2 -norm of a vector and B is the mutual coupling coefficient upper-bound. Note that, apart from the distribution of sensor positions for a 2D sparse arrays, the SA-based optimization method can be turned to operate under the fixed physical aperture [19], [21]. This work utilizes the SA algorithm as an initialization step for the deep learning model. Mainly, the SA algorithm is used to generate sparse subarrays with large physical apertures and well-distributed sensors instead of an enumeration approach. Specifically, the SA optimization stage follows the steps below: (a) Firstly, an initial random M -element 2D sparse subarray is generated out of a full N -element 2D array i.e Z init . (b) Then, at each iteration it perturbates the Z init while maintaining the corner sensors, i.e., M \Z ψ . Here, Z ψ is defined as contains all corner sensors. Also, the number of permited missing virtual sensors η in the difference coarray (DCA) of Z init is set zero [19]. Using the current temperature β, we defined the acceptance probability function ρ(∆κ, β) as such that ∆κ = κ n − κ n−1 where κ n and κ n−1 denote objective functions of the new and the previous solutions. Thus, if the new κ is smaller than the preceeding one, the solution is accepted. Finally, at the end of the iteration the temperature is decreased to block poor solutions from being accepted [21]. This is done using a cooling schedule which is determined by a factor α. Therefore, using β o , at i-th iteration the temperature becomes And, the temperature is reduced at each iteration until the algorithm converges. Note that the smaller the ∆κ, the higher the temperature. Furthermore, by cooling down the temperature slowly, it slows down the convergence rate. Therefore, it is essential to select a higher value of β o to escapes from the local minimum and an optimal value of α to increase chances of obtaining a global optimum [6], [19], [21]. (c) Lastly, the steps (a)-(b) are repeated until a reasonable number of subarrays are realized to construct a solution set H sa . Figure 1 summarizes the above steps into a generalized flow diagram. Note that this SA-based approach can be extended to any planar array configuration, and the same applies to the whole SA-based initialization method [19], [21]. Following labeling approach in step (4) of section (II-B), we select the best subarrays from H sa with the lowest CRB values as labels.
The above steps are summarized in the Algorithm 1. In Algorithm 1, the inputs are as follows: the total number of given antennas N , the number of antennas to be selected M , the number of snapshots T , the number of different DOA angles K, the number of signals and noise realizations P and the SNR. Moreover, the elements of V in step 5 are chosen from H sa which is calculated using the proposed SAbased optimization method as shown in Fig. 1 rather than the enumeration of the entire combinations like in H. The SNR used for calculation of the covariance matrices in step 4 is denoted as SNR TRAIN .

B. APPLICATION TO 2D DOA ESTIMATION
Following the selection of 2D sparse arrays, the received signal on the selected sparse array can be expressed as such thatĀ(Θ) is the sub-matrix of a N × N covariance matrix A(Θ) whose entries are constructed based on the sensor locations in the selected sparse array [11], [19]. Then,  , u , u 2 ), . . . , (R (P,K) , u P K ) where the size of the training dataset is R = P K.
a new covariance matrix is computed as And, vectorizing (14) we obtain the following coarray model where A c =Ā * ⊙Ā, p = [p 1 , p 2 , . . . , p K ] T and r = vec(I M ) [20]. Here, A c is the extended array manifold whose sensor locations are given by a difference coarray (DCA) which can be expressed as Z co = {m 1 − m 2 |m 1 , m 2 ∈ Z}. By carefully deleting the repeated rows, A c yields a new array manifold representing a virtual uniform rectangular array (URA) structure with enhanced degrees of freedom [21]- [24]. Therefore, if the signal model in (13) is used directly, the proposed method can estimate K < M sources only. However, suppose the coarray signal model is exploited as in (15). In that case, the proposed method can estimate more sources than the number of sensors, i.e., K > M [21], [23].

IV. NUMERICAL EXAMPLES
This section examines the performance of the proposed SAassisted DL-based antenna selection approach for 2D sparse array selection. We train and test the CNN model using the data generated by the conventional method [19] and the proposed method. The problem of selecting 16 sensors (M ) out of a 42−sensor URA is considered throughout the section. Later, we evaluate the realized enhanced DL-based 2D sparse array in terms of array structure characteristics and root-mean-square-error (RMSE) of 2D DOA estimation.
In the subsequent paragraphs, we employ subscripts (·) TRAIN and (·) TEST to indicate parameters used for training and testing modes, respectively. Moreover, based on [11] the training data is obtained by sampling the DOA space with K θ,ϕ directions whereas the DOAs in the test data are randomly selected. Also, for performance comparison purposes we consider (a) the conventional DL-based 2D sparse array [11], (b) a 16-element SA-optimized sparse array [21], (c) a 16-element URA and (d) the original 42-element URA [1]. Note that all the 2D sparse arrays consist of M = 16 sensors, and the same aperture except for the parent 42-element URA.

A. CNN ARCHITECTURE
For objective comparison, we adopt a general CNN structure consisting of 8 layers as in [11]. In general terms, the first layer (1 st layer) accepts the 2D input and the last output layer (8 th layer) is a classification layer with l units where a softmax function is used to obtain the probability distribution of the classes [19]. The second (2 nd layer) and the fourth (4 th layer) layers are max-pooling layers with 2 × 2 kernel to reduce the dimension whereas the third (3 rd layer) and the fifth (5 th layer) layers are convolutional layers with 64 filters of size 2 × 2.
Finally, the seventh (7 th layer) and the eighth (8 th layer) layers are fully connected layers with 1024 units. Note, the rectified linear units (ReLU) are used after each convolutional and fully connected layers such that ReLU(x) = max(x, 0) [11]. During the training phase: 90 % and 10 % of the data are allocated for training and validation purposes. The stochastic gradient descent with momentum (SGD) is used with a learning rate of 0.03, and a mini-batch of 500 for 50 epochs [20].

B. TRAINING DATA GENERATION
In this section, using a URA with 42 sensors, we generate two distinct training datasets using the conventional method [11] and the proposed method. For the former, we randomly sample 10000 subarrays, whereas, for the latter, we employ the proposed SA-based optimization method. To realize classes H sa using proposed SA-based algorithm, we assume the following parameters: η = 0, β = 1000, β o = 0.0001 and κ o is as in (10). We sample K ϕ = 120 DOAs uniformly within the range of [0 • , 360 • ) whereas K θ was fixed at 90 • . Furthermore, for each dataset, we assume 10 dB SNR TRAIN and 100 snapshots. Table 1 shows the realized data samples and their corresponding labels. As shown in Table 1, the number of labels generated by the proposed method almost tally with those realized using the conventional method despite using a small number of samples as compared to the conventional approach. Thus, the proposed initialization step achieves large label samples from a small predefined generated dataset compared to the conventional method. Thus, the proposed approach considerably reduces computation costs.

C. ESTIMATION PERFORMANCE AND ACCURACY
This section evaluates the estimation accuracy of the proposed method compared to the conventional approach using the CNN models trained using the datasets generated in the previous section. In the first example, we test the CNN model with the data generated using parameters as shown in the second column of Table 2 to predict sparse 2D arrays. Figure  2 shows the array configuration of predicted 2D sparse arrays. Particularly, Fig. 2 (a) illustrates the parent 42−element URA whereas Fig. 2 (b)-(c) show the realized 16-element DLbased 2D sparse array using the conventional and proposed method respectively. It can be observed that the proposed method yields a sparse array with a larger aperture compared to the conventional DL-based sparse array.
In the second example, we evaluate the proposed method's DOA estimation performance compared to the conventional method using parameters as shown in the third column of Table 2. The realized sparse arrays from CNN are fed to a MUltiple SIgnal Classification (MUSIC) algorithm [11] for DOA estimation. In this case, the SNR TEST is varied from −20 dB to 10 dB over 100 number of trials. Figure 3 (a) shows the RMSE of DOA estimation as function of SNR TEST . Namely, whereφ i k and ϕ k denote the estimated and true k−th DOAs in the i−th trial, respectively. Note that the best subarray as indicated Fig. 3 (a) represents the subarray with the lowest CRLB value or the label. As a result, we compare the best subarray's DOA estimation performance with the predicted arrays, i.e., CNN generated sparse arrays by both the conventional and the proposed method. Moreover, it can be observed that the proposed method follows and converges quickly with  the best subarray performance compared to the conventional method.
In the third example, we evaluate the proposed method's sensor selection accuracy compared to the conventional method using parameters as shown in the fourth column of Table 2. The realized sparse arrays from CNN are compared to the best subarrays or labels to evaluate the classification performance [12]. Similarly, during the testing stage the SNR TEST is varied from −20 dB to 10 dB over 100 number of trials. Figure 3 (b) shows the accuracy of sensor selection as a function of SNR TEST , i.e., where D is the total number of input data in which the model identifies the best subarray correctly F times [16]. As a result, we observed that the proposed method has more than 90% accurate for SNR TEST ≥ −8 dB when the network is trained by the dataset with SNR TRAIN ≥ 10 dB. Compared to the conventional method, it has less than 90% accurate for SNR TEST ≥ −8 dB when the model is trained with the same parameters. Hence, the proposed method shows improved performance compared to the conventional method.

D. 2D DOA ESTIMATION PERFORMANCE
In this section, we evaluate the performance of the realized 2D sparse array in section IV-B using the proposed method in comparison to the parent 42−sensor URA and other 2D sparse arrays. In particular, we explore the behavior of the RMSE as a function of SNR and the number of snapshots.
Here, the RMSE is defined as whereφ i k ,θ i k and ϕ k , θ k denote the estimated and true k−th DOAs in the i−th trial, respectively. Table 3 lists the parameters used to compute RMSE with respect to SNR (Example #4) and the number of snapshots (Example #5), respectively. Moreover, the 2D-ESPRIT algorithm is used to estimate the sources [19], [24]. However, if the DCA of the realized 2D arrays has holes in the coarray, the resulting virtual 2D array becomes irregular. Therefore, the spatial-smoothing DOA estimation method such as 2D-ESPRIT cannot be applied. This is the case as the 2D-ESPRIT algorithm requires a URA array structure for spatial smoothing pre-processes [24]. As a result, a nuclear norm minimization (NNM) approach is applied to fill the holes [21] to restore a standard 2D configuration. Figure 4 shows the DOA estimation performance of the realized DL-based 2D sparse array compared to the parent URA and other 2D sparse arrays. It can be observed in Fig.  4 (a) that the URA with 42 sensors has better performance overall due to the large physical aperture. In contrast, the performance of the DL-based 2D sparse array realized using the proposed method has better than a URA with 16 sensors and slightly lower than that of the parent URA. Besides, the proposed 2D sparse array performs better than the conventional SA-based sparse array. Moreover, the conventional DL-based performed poorly as compared to both the proposed array and 16−sensor URA.
In Fig. 4 (b), we can observe a similar trend where the performance of the proposed array is bounded by the parent   to a manageable size without considerable loss of DOA estimation resolution. Moreover, compared to other conventional methods such as SA, the realized sparse arrays exhibit enhanced DOA estimation performance.

E. COMPUTATION COMPLEXITY
The enumeration method used to generate H has a computation complexity of O (  Table 4 summarizes the computation complexity of the two methods above [16]. In Fig. 5, we compare the computation time required for the proposed DL-based approach and the SA-based optimization method to estimate a single optimal 2D sparse array per 100 iterations. We run the models in MATLAB using a PC with Intel(R) Core (TM)-i5 at 2.60 GHz with 4GB RAM. As indicated in the table, the proposed method requires less computation time to yield a 2D sparse array given a N −element 2D parent array than the computation time required for the SA-based optimization technique to optimize an M −element 2D sparse array [21].

V. CONCLUSION
The paper presented a novel two-stage DL-based sparse array selection approach. The first stage is an initialization step that uses the SA algorithm to generate sparse array configurations with key target features. As a result, it enables data labels to be realized using few data samples, which reduces computational costs and time. The final dataset is then used to train a CNN model in the second stage. Simulation results indicate that the proposed method yields sparse arrays with enhanced DOA estimation performance. Moreover, the results showed that the proposed approach realizes many labels from small data samples. More importantly, the CNN model requires less computation time to converge at an optimum solution (almost 10 times less) than the conventional SA-based optimization approach upon training and deployment.