Branch-and-Bound Search and Machine Learning-Based Transmit Antenna Selection in MIMOME Channels

The studies on the conventional transmit antenna selection (TAS) in multiple-input, multiple-output, multiple-antenna eavesdropper (MIMOME) channel have been focused on determining the optimal number of transmit antennas or selecting the optimal transmit antenna indices. To maximize the secrecy capacity, we need to develop a novel TAS scheme to simultaneously determine the optimal number of transmit antennas and select the corresponding optimal transmit antenna indices. In this paper, we propose the iterative branch-and-bound search-based TAS (IB-TAS) scheme to determine the optimal transmit antenna set in MIMOME channel. To reduce the computational complexity caused by TAS, we also propose the machine learning-based TAS (ML-TAS) schemes utilizing neural network (NN), support vector machine (SVM), and naive-Bayes (NB). Through the simulation and numerical results, it is demonstrated that the IB-TAS scheme achieves the optimal secrecy capacity. In addition, through comparative analysis of the proposed ML-TAS schemes, it was shown that the NN-based TAS scheme minimizes the computational complexity while minimizing the loss of secrecy capacity compared to other ML-TAS schemes.


I. INTRODUCTION
Wireless communication has become indispensable in personal and military fields [1]. Security in wireless communication is essential to prevent eavesdropping of transmitted data from unauthorized users. In recent years, there has been increasing attention to simple and efficient security techniques such as physical layer security (PLS) instead of traditional encryption techniques, which have inherent difficulties and vulnerabilities related to complicated key encryption and management procedures [2].
A pioneering study on PLS is investigated on a wiretap channel [3] with a single antenna for each of a legitimate transmitter (Alice), a legitimate receiver (Bob), and an Eavesdropper (Eve). It has been extended to The associate editor coordinating the review of this manuscript and approving it for publication was Rosalia Maglietta . multi-antenna environments such as multiple-input, singleoutput, multiple-antenna eavesdropper (MISOME) and multiple-input, multiple-output, multiple-antenna eavesdropper (MIMOME) channels [4], [5]. In MISOME and MIMOME channels, various PLS techniques have been proposed, such as transmit-beamforming [6], [7], artificial noise [8], [9], and antenna selection [10], [11], [12], [13], [14] based on channel state information (CSI) of Bob and Eve. In particular, the transmit antenna selection (TAS) based PLS schemes effectively maintain communication security at the cost of low feedback overhead.
The initial works on TAS in MIMOME channel focused on single antenna selection that maximizes the instantaneous signal-to-noise ratio (SNR) at Bob [10], [11], [12] when Bob's CSI is only available at Alice. These studies additionally considered receive combining schemes such as maximum ratio combining, selection combining, and generalized VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ selection combining. The authors in [13] and [14] investigate the multiple transmit antenna selection schemes to maximize the achievable secrecy capacity depending on whether Eve's perfect CSI is available at Alice or not. In [13], the transmit antenna is selected, which has the largest channel gain oneby-one until the selected transmit antenna set maximizes the secrecy capacity. The authors in [14] propose the branchand-bound search-based TAS (B-TAS) scheme to select the optimal transmit antenna indices when the number of transmit antennas to choose is fixed. The B-TAS scheme shows nearly optimal performance and reduces the computational complexity compared to the exhaustive search-based TAS scheme. Therefore, we need to develop a novel scheme determining the optimal transmit antenna set, i.e. the optimal number of transmit antennas and the corresponding antenna indices, based on Eve's CSI available at Alice.
On the other hand, as the number of transmit antennas increases, the TAS schemes cause high computational complexity. Thus, it is necessary to reduce the complexity while maintaining the secrecy capacity for practicality. Machine learning is one of the candidate techniques to minimize computational complexity effectively. The machine learning model is implemented based on offline training data set. As a result, the trained model operates online, reducing the computational complexity compared with the iterative algorithms [15].
From this point of view, machine learning-based antenna selection schemes have been proposed in multiple-input and multiple-output (MIMO) channel [16], [17], [18], [19]. The authors in [16] propose the receive antenna selection scheme based on the k-nearest neighbor (k-NN) and the support vector machine (SVM). These schemes minimize the bit error ratio (BER) and reduce feedback overhead and computational complexity. In [17], the authors propose the k-NN, the SVM, and the neural network (NN) based TAS schemes to minimize the BER. The NN-based scheme shows the best performance because it utilizes a robust structure with parametric forms. Since the output layer neurons of the NN-based scheme proposed in [17] are mapped to possible combinations of transmit antennas, the number of output layer neurons increases exponentially as the number of transmit antennas increases. It causes high computational complexity.
The authors in [18] and [19] propose the NN-based TAS schemes to maximize minimum SNR and capacity. The output layer neurons are mapped to the transmit antenna indices in these schemes. Thus, if the output value of the neuron in the output layer is greater than or equal to 0.5, the corresponding transmit antenna is selected. Otherwise, the transmit antennas are not selected. In this structure, the number of output layer neurons linearly increases as the number of transmit antennas increases. Thus, the NN-based schemes proposed in [18] and [19] efficiently reduce the computational complexity compared to the structure that the output layer neurons are mapped to possible combinations of transmit antennas.
The authors in [20] and [21] utilize machine learning and reinforcement learning algorithms to select a single transmit antenna in MIMOME channel. In [20], the authors use SVM and the naive-Bayes (NB) to reduce the feedback overhead and computational complexity while maximizing secrecy performances depending on whether Eve's CSI is available at Alice or not. In [21], the authors use the deep Q network, i.e. one of the reinforcement learning algorithms, to maximize the secrecy outage probability and minimize the BER when the perfect CSI of Eve is not available at Alice. However, these methods only focus on a single transmit antenna. To the best of our knowledge, the machine learning-based TAS (ML-TAS) schemes have not been investigated to determine the optimal transmit antenna set. Therefore, it is necessary to develop the novel ML-TAS schemes in MIMOME channel.
In this paper, we propose the iterative branch-and-bound search-based TAS (IB-TAS) schemes to find the optimal number of transmit antennas and the corresponding optimal transmit antenna indices based on Eve's CSI available at Alice, i.e. perfect and partial CSI cases. We also design supervised learning-based machine learning models to reduce the computational complexity of TAS. The main contributions of this paper are following: 1) When the perfect CSI of Eve is available at Alice, we propose the IB-TAS scheme to maximize the achievable secrecy capacity. The IB-TAS simultaneously determines the optimal number of transmit antennas and the corresponding optimal transmit antenna indices. 2) When Eve's partial CSI, i.e. channel statistics, is available at Alice, we derive the lower bound on the expected capacity between Alice and Eve. The upper bound on the estimated secrecy capacity is then derived based on this lower bound. As a result, we propose the IB-TAS scheme to maximize the estimated secrecy capacity. 3) We propose the ML-TAS schemes utilizing NN, SVM, and NB. These ML-TAS schemes are trained in a supervised learning manner based on the proposed IB-TAS schemes. The simulation and numerical results demonstrate that the NN-based TAS scheme minimizes the computational complexity at the cost of the minimal secrecy capacity loss, compared to the other ML-TAS schemes.
For notations, uppercase and lowercase boldface represent matrices and vectors, respectively. A T , A H , A −1 , and det (A) denote the transpose, the conjugate transpose, the inverse (or pseudo-inverse), and the determinant of a matrix A, respectively. I n denotes the n × n identity matrix. a [n] denotes the n th element of a vector a and |I| denotes the cardinality of a set I. Further, \ denotes the set difference operator and E [·] denotes the expectation operator.

II. SYSTEM MODEL
We consider MIMOME channel in which Alice, Bob, and a passive Eve are equipped with N A , N B , and N E antennas, respectively. A large number of antennas at Alice are considered, i.e. N A > N B and N A > N E . The channel matrix from Alice to Bob is denoted as H ∈ C N B × N A , which consists of channel vectors, i.e. h 1 , h 2 , · · · , h N A . The channel matrix from Alice to Eve is denoted as G ∈ C N E × N A , which consists of channel vectors, i.e. g 1 , g 2 , · · · , g N A . It is assumed that the channel matrices of H and G are fast Rayleigh fading channels, which are statistically independent of each other in a rich scattering environment [22]. The entries of H and G are independent and identically distributed (i.i.d.) complexvalued Gaussian random variables with zero-mean and unitvariance. It is presumed that Bob and Eve can estimate its own MIMO channel, i.e. H and G, respectively. The channel estimation depends on uplink or downlink training procedure in a given duplexing mode of the system. In particular, it is possible to estimate the downlink channel using uplink training due to channel reciprocity in TDD mode. To focus on the TAS for PLS, we do not consider a specific uplink or downlink training procedure in this paper. In addition, it is assumed that the total transmit power is equally allocated to the selected transmit antennas. The signal vector received at Bob is then represented as where x is the N A × 1 transmitted signal vector with unit power, and ρ B denotes the transmit signal-to-noise ratio (SNR) at Bob, and n B is zero-mean and unit-variance complex additive white Gaussian noise (AWGN), i.e. n B ∼ CN 0, I N B . The signal vector overheard at Eve is given as where ρ E denotes the transmit SNR at Eve, and n E is zero-mean and unit-variance complex AWGN, i.e. n E ∼ CN 0, I N E . In (1) and (2), the transmit power of each rank is given as  (2), the achievable secrecy capacity, C S , is defined as where Depending on the availability of Eve's CSI at Alice, we consider two scenarios. Scenario A is that the perfect CSI of G is available at Alice, and scenario B is that the partial CSI, i.e. channel statistics of G, is available at Alice. 1 In both scenarios, it is presumed that the perfect CSI of H is available at Alice.

III. ITERATIVE BRANCH-AND-BOUND SEARCH-BASED TAS
The conventional B-TAS focuses on selecting optimal k transmit antennas when the number of transmit antennas to select, i.e. k, is given and fixed in the system [14]. Thus, the conventional B-TAS determines the optimal transmit antenna set, S k , i.e. the optimal k transmit antenna indices, to maximize the secrecy capacity. Thus, there is an opportunity to improve the secrecy capacity by determining the optimal transmit antenna set, i.e. the optimal number of transmit antennas and the corresponding antenna indices. Meanwhile, most existing studies focus on whether the perfect CSI of Eve is available at Alice or not. However, from a practical point of view, the partial CSI of Eve, such as the channel statistics, should be considered. By borrowing the branch-and-bound search concept in the conventional B-TAS, we propose the IB-TAS scheme to find the optimal number of transmit antennas and the corresponding optimal transmit antenna indices based on Eve's CSI available at Alice, i.e. perfect and partial CSI cases.

A. SCENARIO A: PERFECT CSI OF G IS AVAILABLE AT ALICE
The IB-TAS finds total N A transmit antenna sets, i.e. S 1 , S 2 , · · · , S N A , and calculates the corresponding achievable secrecy capacities. It then selects the optimal transmit antenna set, S opt , with the largest secrecy capacity.
Initially, S k is set to an empty set and the antenna index set, J , includes all transmit antenna indices. Let H n ∈ C N B ×n and G n ∈ C N E ×n denote the submatrices of H and G for the selected n (i.e. n < k) transmit antennas. Based on H n and G n , the secrecy capacity for the selected n transmit antennas, C S,n , is defined as where m is the rank of H k , i.e. m = rank (H k ) = min (k, N B ), and H k is the selected channel matrix corresponding to S k . Since the IB-TAS selects k transmit antennas for determining S k , the allocated power for each rank is given as Suppose that J n+1 is the (n + 1) th selected transmit antenna index, and h n+1 and g n+1 are the J n+1 th column vectors of H and G, respectively. Then, S k and J are updated to S k = S k ∪ {J n+1 } and J = J \ {J n+1 }, respectively. After n + 1 transmit antennas are selected, the updated channel matrices, i.e. H n+1 ∈ C N B ×(n+1) and G n+1 ∈ C N E ×(n+1) , are then given as H n+1 = [H n , h n+1 ] and G n+1 = [G n , g n+1 ], respectively. Based on the updated channel matrix H n+1 , C B,n+1 is derived as The equation of (a) in (5) holds for the Sylvester's determinant identity, i.e. det (X + AB) = det (X) det I + BX −1 A [23]. Based on the updated channel matrix G n+1 , C E,n+1 is derived as = C E,n + log 2 Based on (6) and (8), C S,n+1 is derived as Thus, the most recently selected transmit antenna index only affects the term of n+1 . Suppose that J n+2 is the (n + 2) th selected transmit antenna index, and h n+2 and g n+2 are the J n+2 th column vectors of H and G, respectively. Then, S k and J are updated to respectively. Similar to (5) and (6), C B,n+2 corresponding to H n+2 is derived as Similar to (7) and (8), C E,n+2 corresponding to G n+2 is derived as Using the Sherman-Morrison formula, i.e. [23], the matrices T B,n+1 in (10) and T E,n+1 in (12) are given as where t B,n+1 = and t E,n+1 = .
To reduce the computational complexity for calculating ν B,n+2 in (11) and ν E,n+2 in (13), both terms can be rewritten as In (16) and (17),ν B,n+1 is given as h H n+2 T B,n h n+2 andν E,n+1 is given as g H n+2 T E,n g n+2 . Based on (11) and (13), C S,n+2 is derived as In the conventional B-TAS, the branch-and-bound search algorithm in [24] is efficient for a monotonic objective function. By borrowing the branch-and-bound search concept in the conventional B-TAS, the IB-TAS uses the new objective function,Ĉ S,n , instead of C S,n in (4).Ĉ S,n is defined aŝ where , and the index set, I a , consists of all of the candidate transmit antenna indices in the step of selecting the a th optimal transmit antenna. Based on (9), (18), and (19), the recursive form of the monotonic objective function is derived aŝ Based on (20), the optimal S k is determined by the IB-TAS. Fig. 1 shows an example of search tree for the IB-TAS when k is 2 and N A is 6. This example shows the search procedure to determine the optimal S 2 . In Fig. 1, the number on each node, i.e. node index, denotes the transmit antenna index. In level 1, eachĈ S,1 corresponding to each element of I 1 is first calculated by (20), where I 1 is the index set for level 1, i.e. I 1 = {1, 2, 3, 4, 5}. Let us assume that the node 2 in level 1 has the largestĈ S,1 among the other nodes in level 1. IfĈ S,1 of the node 2 in level 1 is smaller than the global lower bound, B, all child nodes corresponding to the node 2 in level 1 are pruned. In general, the initial value of B is very small, e.g. −∞. IfĈ S,1 of the node 2 in level 1 is larger than B, eachĈ S,2 corresponding to each child node of the node 2 in level 1 is calculated by (20). Let us assume thatĈ S,1 of the node 2 in level 1 is larger than B and the node 4 in level 2 among child nodes of the node 2 in level 1 has the largestĈ S,2 . If itsĈ S,2 is larger than B, B is updated to the largestĈ S,2 and the optimal transmit antenna set, S 2 , is updated to {2, 4}. Then, it returns to the top level, i.e. level 1, and repeats the search procedure. If itsĈ S,2 is smaller than B, it returns to the top level, i.e. level 1, and repeats the search procedure. This is repeated until all the nodes in the search tree are pruned or visited for calculating the monotonic objective function, i.e. (20). Then, the optimal S 2 can be determined.
The achievable secrecy capacity corresponding to S k can be calculated as where H k and G k are the channel matrices corresponding to S k . The IB-TAS determines total transmit antenna sets, i.e. S 1 , S 2 , · · · , S N A , and calculates the corresponding achievable secrecy capacities based on (22). The optimal transmit antenna set with the largest secrecy capacity, S opt , is then determined among total transmit antenna sets, i.e.
Therefore, the IB-TAS can determine the optimal number of transmit antennas and the corresponding optimal transmit antenna indices to maximize the achievable secrecy capacity.

B. SCENARIO B: CHANNEL STATISTICS OF G ARE AVAILABLE AT ALICE
When the channel statistics of G are only available at Alice, we derive the lower bound on the expected capacity between Alice and Eve, i.e. E [C E ]. The upper bound on the estimated secrecy capacity is then derived based on this lower bound. As a result, we propose the IB-TAS scheme to maximize the estimated secrecy capacity. Similar to the operation in scenario A, the IB-TAS determines total N A transmit antenna sets, i.e. S 1 , S 2 , · · · , S N A , and calculates the corresponding estimated secrecy capacities.
Initially, S k is set to an empty set and the antenna index set, J , includes all transmit antenna indices. Let H n and G n denote the submatrices of H and G for selected n transmit antennas. Based on H n and G n , the estimated secrecy capacity for the selected n transmit antennas, C S,n , is defined as where m is the rank of H k , i.e. m = rank (H k ) = min (k, N B ), and H k is the selected channel matrix corresponding to S k . Since the IB-TAS selects k transmit antennas for determining S k , the allocated power for each rank is given as in (23). From (24), the expected C E,n , E C E,n , is derived as whereρ E = ρ E m and u n = min (n, N E ). In addition, The inequalities of (a) in (24) and (c) in (26) are due to the Minkowski determinant theorem [25] and the Jensen's inequality [27], respectively. In addition, the equality of (b) in (25) holds for the Jacobi's formula [26]. Since the term of log 2 (c + exp (·)) in (25) is a convex function, it is given as (27) is given as where ψ (·) is the digamma function and γ is the Euler-Mascheroni constant [28,Th. 2.11]. Based on (23) and (27), the upper bound of C S,n is given as Suppose that J n+1 is the (n + 1) th selected transmit antenna index, and h n+1 and g n+1 are J n+1 th column vectors of H and G, respectively. Then, S k and J are updated to S k = S k ∪ J n+1 and J = J \ J n+1 , respectively. After n+1 transmit antennas are selected, the selected channel matrix, i.e. G n+1 , is given as [G n , g n+1 ]. Based on the updated channel matrix VOLUME 10, 2022 H n+1 , C B,n+1 is given as (6). Based on the updated channel matrix G n+1 , the expected C E,n+1 , E C E,n+1 , is derived as where u n+1 = min (n + 1, N E ) and The lower bound of (31) is derived using the Minkowski determinant theorem, the Jacobi's formula, and the Jensen's inequality. Similar to (28) and (29), the term α n+1 in (32) is given as Based on (6) and (33), the estimated secrecy capacity for the selected n + 1 transmit antennas, C S,n+1 , is derived as Thus, the most recently selected transmit antenna index only affects the term of n+1 . Suppose that J n+2 is the (n + 2) th selected transmit antenna index, and h n+2 and g n+2 are the J n+2 th column vectors of H and G, respectively. Then, S k and J are updated to S k = S k ∪ J n+2 and J = J \ J n+2 , respectively. The updated channel matrices are thus H n+2 = [H n+1 , h n+2 ] and G n+2 = [G n+1 , g n+2 ], respectively. C B,n+2 corresponding to H n+2 is given as (11). To reduce the copmutational complexity for calculating ν B,n+2 in (11), the matrix T B,n+1 and the term of ν B,n+2 can be updated using (14) and (16), respectively. Similar to (30), E C E,n+2 corresponding to G n+2 is derived as where u n+2 = min (n + 2, N E ) and v n+2 = max (n + 2, N E ).
In addition, the term α n+2 in (36) is given as Based on (11) and (36), the estimated secrecy capacity for the selected n + 2 transmit antennas, C S,n+2 , is derived as Similar to scenario A, the monotonic objective function, C S,n , instead of C S,n is used and it is defined aŝ where Z a = log 2 Based on (40), the optimal S k is determined by the IB-TAS. S k can be determined using the IB-TAS algorithm in section III-A to which the monotonic objective function, i.e. (40), is applied. Then, the estimated secrecy capacity corresponding to S k can be calculated as where H k and G k are the channel matrices corresponding to S k . After total transmit antenna sets, i.e. S 1 , S 2 , · · · , S N A , and the corresponding estimated secrecy capacities are calculated, the optimal transmit antenna set with the largest secrecy capacity, S opt , is then determining using (22). After initializing the parameters and variables for the given k, the IB-TAS constructs the search tree for selecting optimal k transmit antennas. It then determines the optimal transmit antenna set, S k , based on branch-and-bound search (BAB), which includes k optimal transmit antenna indices. This process from the first to the third blocks in Fig. 2 is repeated until N A optimal transmit antenna sets corresponding to the given k, i.e. S 1 , S 2 , · · · , S N A , are determined. Finally, in the fourth block of Fig. 2, S opt among N A optimal transmit antenna sets is determined, which maximizes the secrecy capacity. Algorithm 1 summarizes the pseudo-codes for the IB-TAS of scenarios A and B. In Algorithm 1, the sub-algorithms for the cases of scenarios A and B are Algorithms 2, 4, and 6, and Algorithms 3, 5, and 6, respectively. Algorithms 2 and 3 construct the search tree for the given k. Algorithms 4 and 5 update the parameters for further exploration of the visited node in the current search level. Algorithm 6 prunes the child nodes below the visited node in the current search level. Note that Algorithms 4, 5, and 6 are used to determine S k through BAB. In Algorithms 1-6, I a is the index set for the nodes in level a, I a,b is the index set for the child nodes of node b in level a, and s is the selected transmit antenna index vector.
In Algorithm 1, the process from step 1 to step 35 represents determining S k and calculating C S k for 1 ≤ k ≤ N A . This process corresponds from the first block to the third block in Fig. 2. Step 2 initializes the parameters and variables for the given k, as shown in the first block of Fig. 2. Since the search procedure starts from the root node in level 0, the currently visited antenna index, J , is set to 0, i.e. J = 0.
S k is determined through the search procedure from steps 6 to 32 when 1 ≤ k ≤ N A − 1. When k is N A , all transmit antennas are selected in steps 3 to 4. The search tree for the given k is constructed through Algorithm 2 or Algorithm 3 in step 6. Algorithms 2 and 3 correspond to the second block of Fig. 2. In Algorithms 2 and 3, Z a is calculated as given in (19) and (39), respectively. Based on the calculated Z a , the monotonic objective function c j corresponding to I n,j is calculated in the following steps of Algorithm 1.
The process from step 7 to step 31 determines S k , which corresponds to the third block in Fig. 2. When searching the nodes of level k − 1, i.e. n = k − 1, in steps 8 to 16, each c j corresponding to I n,J is calculated based on 20 and 40 for scenarios A and B, respectively. If c max in step 10 is larger than B, s, B, and S k are updated by step 12. If c max is smaller than B, s, B, and S k are not updated, and it returns to step 7. After those procedures, the index J is eliminated from I n−1,s[n −1] . The search procedures from steps 8 to 16 are repeatedly performed until all nodes are visited.
When searching the nodes in the other levels, i.e. n < k −1, in steps 18 to 30, each c j corresponding to I n,J is given as in step 9. In these steps, c J is the largest value among c j in the current search level n, andn is the level to which the child nodes of node J in level n belong. Note thatn is the lower level than n. If c J is larger than B, s andĈ S are updated through Algorithms 4 and 5 in step 24. Then, the child nodes of node J in level n are visited to calculate j corresponding to In ,J and n is updated to n =n. If c J is smaller than B, the child nodes of node J in level n are pruned by step 27 through Algorithm 6, and it returns to step 7. If I 0,0 becomes the empty set in step 7, the optimal S k is determined to s, which was updated in step 12. In steps 33 and 34, the achievable secrecy capacity corresponding to the optimal S k is calculated using 21 in scenario A and 41 in scenario B. After determining the optimal transmit antenna sets for all values of k, the optimal transmit antenna set with the largest secrecy capacity, S opt , is determined by step 36, which corresponds to the fourth block in Fig. 2.

D. COMPUTATIONAL COMPLEXITY
We use the big O notation to measure the computational complexity of the IB-TAS schemes proposed in Algorithm 1. Since all nodes, except for the pruned node, in the search tree are visited for calculating the monotonic objective function, the iterative calculation of the objective function causes high computational complexity. When calculating the objective Algorithm 1 IB-TAS  calculate C S k using (21) 35: end for 36: determine S opt using (22) function, the computational complexity of j is dominant compared to the others. Since j is iteratively calculated N T times, the total computational complexities for scenarios A and B of Algorithm 1 are given as O (N A max (N B , N E Algorithm 2 Construct the Search Tree for Scenario a 1: I a = {a, a + 1, · · · , N A + a − k} , ∀a ∈ K 2: I 0,0 = I 1 , Algorithm 3 Construct the Search Tree for Scenario B 1: I a = {a, a + 1, · · · , N A + a − k} , ∀a ∈ K 2: I 0,0 = I 1 , , ∀j ∈ In ,J 5: 1+ρ E ν E,j , ∀j ∈ In ,J 6: n =n and O (N A N B N T ), respectively. Note that N T is the total number of visited nodes to find the optimal transmit antenna set, which increases as the number of transmit antennas increases.

IV. MACHINE LEARNING-BASED TAS SCHEMES FOR PLS
Although the IB-TAS maximizes the secrecy capacity, the iterative search process causes high computational complexity. Machine learning is one of the candidate techniques to minimize computational complexity effectively, and the TAS can be interpreted as a multi-class classification problem. In this section, we propose the machine learning-based TAS (ML-TAS) schemes to solve the multi-class classification problem using neural network (NN), support vector machine (SVM), and naive-Bayes (NB).

A. TRAINING DATA SET GENERATION
We generate a training data set, which contains the M training channel samples for scenarios A and B. The m th training , ∀j ∈ In ,J 5:

1) FEATURE VECTOR GENERATION
Because machine learning models take the real-valued feature vectors as input, the complex-valued training channel samples should be converted to the real-valued feature vector. The m th feature vector, f m ∈ R 1×N , is generated for each scenario. For scenario A, f m is given as and for scenario B, f m is given as where h m p,q and g m p,q are the (p, q) th elements of H m and G m , respectively. The total number of features, N , are then given as N A (N B + N E ) and N A N B for scenarios A and B, respectively. These feature vectors should be normalized to avoid the local minima when training the machine learning models. Thus, the n th element of the m th normalized feature vector,f m , is given as . (44)

2) LABELING OF THE FEATURE VECTORS
The optimal transmit antenna set, S opt for a given m th training channel sample is determined by the IB-TAS. Thus, we can determine the class label for each feature vector. Table 1 shows an example of a mapping table between class label and indicated S opt when N A is 3. If S opt is {2, 3}, the class label is 6. Therefore, each feature vector can be mapped to one of the class labels based on S opt determined by the IB-TAS.
In particular, we utilize the binary representation of S opt for the supervised learning of the NN model, as shown in Table 1. The binary representation of S opt corresponding to the m th feature vector, y m ∈ R N A ×1 , is defined as where 1 ≤ n ≤ N A .

B. NN-BASED TAS
Inspiring by a biologic neural network, the NN model has been widely utilized as a machine learning model [29]. The NN model consists of the input layer, one or more hidden layers, and the output layer. Each neuron in the current layer is connected to neurons in the previous and subsequent layers. The proposed NN model consists of the input layer, one hidden layer, and the output layer. The number of neurons in each layer is N in the input layer, N h in the hidden layer, and N A in the output layer. When the number of neurons in the hidden layer is small or large, the NN causes underfitting or overfitting. Generally, N h is set to appropriately 2 3 N [30]. The output of hidden layer,h m ∈ R N h ×N , is calculated as where φ h (·) is the rectified linear unit (ReLU) function, W h ∈ R N h ×N is the weight matrix associated withf T m , and b h ∈ R N h ×1 is the bias vector in the hidden layer. Based on (46), the output of output layer,ỹ m ∈ R N A ×1 , is calculated asỹ VOLUME 10, 2022 where φ y (·) is the sigmoid function, W y ∈ R N A ×N h is the weight matrix associated withh m , and b y ∈ R N A ×1 is the bias vector in the output layer.
The errors between y m andỹ m can be measured by the binary cross entropy loss function based on (45) and (47) [31]. The loss function, L y m ,ỹ m , is defined as . (48) The proposed NN model is trained to minimize L y m ,ỹ m through forward and backward propagation processes based on M normalized feature vectors [15]. In the backpropagation process, the parameters of each layer, i.e. W h , W y , b h , and b y , are updated using the adaptive moment estimation algorithm [32], [33], [34].
After completing the training process, the proposed NN-based TAS can select the optimal transmit antenna set for a given channel sample. The given channel sample should be transformed into the normalized feature vector as an input to the NN-based TAS. Since the activation function of the output layer is the sigmoid function, the output value of each neuron in the output layer is determined to be between 0 and 1, i.e. 0 ≤ỹ [n] ≤ 1 and 1 ≤ n ≤ N A . Whenỹ [n] is greater than 0.5, it indicates that the n th transmit antenna is selected to maximize the secrecy capacity. Conversely, ifỹ [n] is less than 0.5, the n th transmit antenna is not selected.

C. SVM-BASED TAS
SVM is used for the classification, which constructs a hyperplane or a set of hyperplanes to separate classes. In SVM, the hyperplane is designed to have the largest distance to the nearest training data point of any class. The proposed SVM model applies the one-vs-rest strategy to classify the M classes using M hyperplanes [14], [20]. The objective function for constructing the hyperplane, which is used to separate between class l and the others, is given as where w l and b l are learning parameters, C l is a non-negative scalar, cost n (z) = max ((−1) n z + 1, 0), and y m l = 1, if the class label corresponding tof m is l, 0, if the class label corresponding tof m is not l.
In (49), ϕ (·) is a Gaussian radial-based kernel function and the n th element of ϕ f m is given as exp − (f m −f n ) 2σ 2 with variance, σ . To minimize f (w l , b l ), w l and b l are updated iteratively using gradient descent method. That is, where η 0 is the learning rate. After completing the training process, the proposed SVM-based TAS can select the optimal transmit antenna set, i.e. the optimal class label l * among M classes, for any given channel sample with the normalized feature vector. The optimal class label l * is given as

D. NB-BASED TAS
NB is a simple probabilistic classification applying Bayes' theorem with the assumption of independence between the elements of the feature vector [35]. The NB-based TAS classifies M normalized feature vectors by their class labels in the training process. Based on the normalized feature vectors whose class label belongs to l, we calculate mean, µ l,n , and variance, σ l,n , of the n th element of them. The posterior probability that any channel sample with the normalized feature vector,f ∈ R 1×N , belongs to class l is given as where P f |l is the probability of the occurrence off given the class label l, P (l) is the prior probability of the class label l, and P f is the probability of the occurrence off.
Thus, selecting the optimal transmit antenna set in the NB-based TAS is equivalent to choosing the class label with the maximal posterior probability for a givenf [20]. Since P f is independent of the class label and P (l) is constant for all classes, the optimal class label l * is determined by (54)

E. COMPUTATIONAL COMPLEXITY ANALYSIS
Since the machine learning models are implemented based on offline training using the data set, the computational complexity is generally measured only for online processing.
In this subsection, we evaluate the computational complexity of ML-TAS schemes, which is required for the prediction, i.e. TAS.
In the NN-based TAS, the computational complexity from the input layer to the hidden layer is associated with (46) and it is given as O (NN h ). Moreover, the computational complexity from the hidden layer to the output layer is associated with (47) and it is given as O (N h N A ). Therefore, the total computational complexity of the proposed NN-based TAS scheme is given as O When selecting the class label l * for a given channel sample in the SVM-based TAS, the calculation of ϕ f causes high computational complexity. Furthermore, it needs to be calculated M times. Thus, its computational complexity is dominant compared to the others. If the number of features, N , is larger than M, the total computational complexity of the SVM-based TAS is given as O N 2 [20]. As the number of transmit antennas increases, M will exponentially increase. When M is larger than N , the total computational complexity of the SVM-based TAS becomes O (MN ).
When selecting the class label l * for a given channel sample in the NB-based TAS, the calculation of N n=1 P f [n] |l causes high computational complexity. Furthermore, it needs to be calculated M times. Since its computational complexity is dominant compared to the others, the total computational complexity of the NB-based TAS is given as O (MN ).

V. SIMULATION RESULTS
In this section, we evaluate the secrecy capacity and the computational complexity of the proposed IB-TAS and the ML-TAS schemes for scenarios A and B. The IB-TAS simultaneously determines the optimal number of transmit antennas and the corresponding transmit antenna indices in MIMOME channel. To the best of our knowledge, there have been no studies on TAS to simultaneously determine the optimal number of transmit antennas and the corresponding antenna indices. To evaluate the performance of the IB-TAS, the exhaustive search-based TAS (ES-TAS) that guarantees optimal performance is used as a reference scheme. Fig. 3 shows the secrecy capacity comparison between the ES-TAS and the IB-TAS. The ES-TAS uses (21) and (41) to determine the optimal transmit antenna set for scenarios A and B. Both schemes show nearly identical secrecy performance in each scenario. It can be interpreted that the IB-TAS guarantees optimal performance. As shown in Fig. 3, the secrecy capacity shown in scenario A is higher than that shown in scenario B due to the accurate CSI of Eve available at Alice.
However, the performance loss between scenarios A and B is relatively small. Table 2 shows the number of transmit antennas selected by the ES-TAS and the IB-TAS in scenarios A and B. As shown in Table 2, the ES-TAS and the IB-TAS in scenario A select 0.3∼1.2 more transmit antennas than those in scenario B. Table 3 shows the accuracy of the selected transmit antenna indices determined by the IB-TAS compared to the ES-TAS in each scenario. For example, 100 % means that the transmit antenna set determined by the IB-TAS includes the optimal transmit antenna set determined by the ES-TAS. Table 3 thus shows that the accuracy of transmit antenna indices selected by the IB-TAS are nearly identical to the optimal transmit antenna set determined by the ES-TAS in each scenario. As shown in Table 3, the accuracy of transmit antenna indices selected by the IB-TAS is nearly identical to the optimal transmit antenna set determined by the ES-TAS in each scenario. Based on Tables 2 and 3, it is observed that the number of selected transmit antennas causes the performance loss between scenarios A and B rather than the accuracy of selected transmit antenna indices. However, its difference is only in the range of 0.3 to 1.2. Therefore, the performance loss between scenarios A and B is relatively small. Fig. 4 shows the secrecy capacity of the IB-TAS and the ML-TAS when SNR at Bob is fixed as 10 dB and SNR at Eve is 0 to 20 dB. In Fig. 4, the solid and dotted lines show the VOLUME 10, 2022   secrecy capacities of TAS schemes in scenario A, and the no-line markers show the secrecy capacities of TAS schemes in scenario B. To train the machine learning models in the ML-TAS, the number of training samples for the case of N A = 32 and N A = 64 is 1.2 × 10 8 and 2.6 × 10 10 , respectively. A sufficient number of samples in each case was generated so that more than five samples should be generated for each class label [15].

A. SECRECY CAPACITY
The IB-TAS shows superior achievable secrecy capacity compared to the ML-TAS schemes because the IB-TAS achieves near-optimal performance as shown in Fig. 4. Table 4 shows the classification accuracy when comparing the optimal transmit antenna set determined by each ML-TAS scheme with the optimal transmit antenna set determined by the IB-TAS in scenario A of Fig. 4. In Table 4, the percentage value on the left is the accuracy for the optimal number of transmit antennas, and the percentage value on the right is the accuracy for the optimal transmit antenna indices. For example, if both percentage values are 100%, it means that the optimal transmit antenna set determined by the corresponding ML-TAS is identical to the optimal transmit antenna set determined by the IB-TAS. As shown in Table 4, the classification accuracy 2 gradually decreases in the order of the NN-based TAS, the SVM-based TAS, and the NB-based TAS. This is because NB is a probabilistic method based on conditional probability, whereas SVM is a feature vector-based deterministic classification method for maximizing the margin between different classes. In addition, unlike SVM and NB, NN solves nonlinear problems through its parametric forms. In addition, it is more fault-tolerant and can handle incomplete data and noise. Therefore, as shown in Fig. 4, the NN-based TAS shows better secrecy capacity than other ML-TAS schemes. Table 5 shows the classification accuracy when comparing the optimal transmit antenna set determined by each ML-TAS scheme with the optimal transmit antenna set determined by the IB-TAS in scenario B of Fig. 4. The performance trend of the proposed TAS schemes is identical to those in scenario A. As mentioned in Fig. 3, the secrecy capacities of the proposed TAS schemes shown in scenario A are higher than those shown in scenario B due to the accurate CSI of Eve available at Alice. Fig. 5 shows the secrecy capacity of the IB-TAS and the ML-TAS when SNR at Eve is fixed as 10 dB and SNR at Bob is 0 to 20 dB. In Fig. 5, the solid and dotted lines show the secrecy capacities of TAS schemes in scenario A, and the no-line markers show the secrecy capacities of TAS schemes in scenario B. As shown in Figs. 4 and 5, secrecy capacity is maximized when SNR at Bob is much greater than SNR at Eve, i.e. the point when SNR at Eve = 0 dB in Fig. 4 and the point when SNR at Bob = 20 dB in Fig. 5. This is because 2 As the number of epochs increases, the performances of the ML-TAS increase and eventually converge. The classification accuracy of the NN-based TAS and the SVM-based TAS increases as the number of epochs increases because the parameters of each method gradually have optimal values. On the other hand, since the NB-based TAS determines the optimal transmit antenna set through the posterior probability for the training channels, the classification accuracy is constant even when the number of epochs increases.  the increased capacity between Alice and Bob leads to an increase in the secrecy capacity. Conversely, if SNR at Eve is much greater than SNR at Bob, the capacity between Alice and Eve, i.e. the eavesdropping ability of Eve, increases. It then leads to a decrease the secrecy capacity. Fig. 6 shows the secrecy capacity according to the number of antennas for Alice, Bob, and Eve, in scenario A. 3 The number of selected transmit antennas for each case is summarized in Table 6. The increase in the number of antennas at Alice provides the opportunity to increase the capacity of Alice-Bob channel or to decrease the capacity of Alice-Eve channel due to the increase in the antenna selection diversity. 3 From the perspective of the convergence time for training the ML-TAS schemes, the increase in the number of antennas at Alice leads to an increase in the required convergence time. It is because the increase in the number of antennas at Alice increases the number of classes, M, and it also increases the number of training samples and epochs required for learning.  As a result, as shown in Table 6, the number of selected transmit antennas increases.
The increase in the number of antennas at Bob increases the rank of Alice-Bob channel. It then leads to the increase in the capacity of Alice-Bob channel so that the secrecy capacity is also increased. In addition, the capacity of Alice-Bob channel increases more than the Alice-Eve capacity as the number of selected transmit antennas increases because the number of antennas at Eve is less than that at Bob. Thus, as shown in Table 6, the number of selected transmit antennas has increased significantly as the number of antennas at Bob increases.
The increase in the number of antennas at Eve increases the rank of Alice-Eve channel. It then leads to the increase in the capacity of Alice-Eve channel so that the secrecy capacity is decreased. In addition, the Alice-Eve capacity increases more than the Alice-Bob capacity as the number of selected transmit antennas increases because the number of antennas at Bob is less than that at Eve. Thus, as shown in Table 6, the number of selected transmit antennas has decreased significantly as the number of antennas at Eve increases. Table 7 summarizes the computational complexity of the IB-TAS and the ML-TAS schemes when Alice selects the optimal transmit antenna set. In the IB-TAS, N T is the total number of visited nodes to find the optimal transmit antenna set, and it can be determined by Algorithm 1. In the ML-TAS, M is the number of possible transmit combinations in TAS, i.e. M = 2 N A −1, and N is given as N A (N B + N E ) and N A N B for scenarios A and B, respectively. Moreover, N h is set to appropriately 2 3 N in the NN-based TAS. Fig. 7 shows the asymptotic computational complexity required for the IB-TAS and the ML-TAS schemes, which are calculated based on Table 7. As shown in Table 6, N T and M significantly affect the computational complexity of the IB-TAS and two ML-TAS schemes (i.e. SVM-based TAS and NB-based TAS), respectively. Since the increase in the number of transmit antennas causes the exponential increase in N T and M, the asymptotic computational complexity of the IB-TAS, the SVM-based TAS, and the NB-based TAS increases exponentially, as shown in Fig. 7.

B. COMPUTATIONAL COMPLEXITY ANALYSIS
In the NN-based TAS, the values of both N and N h increase linearly as the number of transmit antennas increases. Therefore, the computational complexity of the NN-based TAS is significantly reduced compared to other schemes, as shown in Fig. 7. The computational complexity of the IB-TAS and the ML-TAS schemes in scenario B also shows the same trend as in scenario A for the same reason as above.

VI. CONCLUSION
In this paper, we propose the IB-TAS schemes to maximize secrecy capacity based on Eve's available CSI at Alice, i.e. perfect and partial cases, in MIMOME channel. The IB-TAS can determine the optimal number of transmit antennas and select the corresponding optimal transmit antenna indices, simultaneously. However, most TAS schemes generally increase the computational complexity as the number of transmit antennas. To reduce the computational complexity caused by TAS, we also propose the ML-TAS schemes such as the NN-based TAS, the SVM-based TAS, and the NB-based TAS. Simulation results show that the IB-TAS guarantees the near-optimal secrecy capacity, and the NN-based TAS minimizes the computational complexity at the cost of minimal secrecy capacity loss. Through the secrecy capacity and computational complexity comparisons among the proposed schemes, we provide an insight into the potential benefits of integrating machine learning with TAS.