Risk-Aware Antenna Selection for Multiuser Massive MIMO Under Incomplete CSI

This paper investigates the antenna selection problem in massive multiple-input multiple-out (MIMO) systems under incomplete channel state information (CSI), with a particular interest on risk-aware planning subjected to practical constraints such as transmit power budgets and quality of services (QoS). Due to a very large number of antennas, obtaining complete channel measurements becomes a cost-prohibitive, energy-inefficient and spectral-inefficient task. To reduce pilot overhead, incomplete CSI and antenna selection (AS) are expected in practical massive MIMO systems. However, most existing AS algorithms heavily rely on the complete CSI, which imposes a high probability of violating the practical constraints in the scenarios of our interests. Motivated by this, we propose a joint channel prediction and antenna selection framework (JCPAS) which efficiently performs AS and is robust against the incomplete CSI and practical constraints. The proposed framework comprises i) a channel tracker which estimates the channel dynamics based on historical incomplete observations, and ii) a risk-aware Monte Carlo tree search (RA-MCTS) algorithm which utilizes the estimated channel dynamics to select antennas in a risk-aware manner. Simulation results show that the proposed RA-MCTS not only achieves much lower energy consumption compared to the existing typical algorithms, but also significantly reduces the probability of violating the practical constraints.

Dinh Thai Hoang and Diep N. Nguyen are with the School of Electrical and Data Engineering, Faculty of Electrical and Engineering, University of Technology Sydney, Sydney, NSW 2007, Australia (e-mail: hoang.dinhuts.edu.au;diep.nguyenuts.edu.au).
Color versions of one or more figures in this article are available at https://doi.org/10.1109/TWC.2024.3377733.
Digital Object Identifier 10.1109/TWC.2024.3377733base station (BS), massive MIMO is capable of significantly improving the spectral efficiency via spatial multiplexing gain [4].However, the full potential of massive MIMO requires a huge number of dedicated radio frequency (RF) chains for every antenna, which results in not only increased capital expenditure (CAPEX) but also higher system energy consumption [3], [4], [5].In practical massive MIMO systems, it is more cost-effective and energy-efficient to employ a number of RF chains less than the number of antennas, while the full spatial multiplexing gain can be preserved via antenna selection (AS) [4], [5].Being a key component of hybrid signal processing techniques, AS aims to select the best subset of antennas for data transmission to reduce hardware cost and power consumption without losing the full potentials of antenna arrays [3].During the last decades, the AS problem has been extensively studied in the presence of complete channel state information (CSI) (sometimes refer to full CSI), and it has been shown that AS can provide similar spectral efficiency with a lower energy consumption compared to the case without AS [4], [5].

A. Antenna Selection in Massive MIMO
In principle, the AS problem can be formulated as an integer programming problem under the assumption of complete CSI acquisition [5], [6].The problem is NP-hard, thus solving it optimally imposes a prohibitive computational complexity which exponentially grows with the number of antennas in the worst case [6].AS under complete CSI has been extensively investigated by researchers with the objective of designing a near-optimal algorithm with low computational complexities for massive MIMO [6], [7], [8], [9], [10], [11], [12], [13], [14].However, it should be noted that most existing works were established in the presence of complete CSI, which implies that the channel coefficients of all antennas must be fully observed [15], [16], [17].
Unfortunately, this assumption does not strictly hold for practical scenarios, especially with very large-scale antenna arrays [18].This is mainly because complete CSI acquisition turns out to be a time-consuming task in such scenarios.In practice, CSI acquisition is accomplished by pilot sequences which consume radio resources proportional to the numbers of active antennas and users.Since the number of RF chains is smaller than the number of antennas, only an incomplete CSI can be observed and estimated from each pilot transmission.This indicates that acquiring complete CSI eventually leads to extra pilot transmissions and reducing the effective transmission rate [1], [15], [17], [18].For instance, acquiring complete CSI could occupy more than half of the frame duration for the downlink of a typical time-division duplexing (TDD) massive MIMO system with 64 antennas, 16 RF chains and 4 single-antenna users [15], [19].Therefore, it is of importance to take incomplete CSI acquisition into account in practical AS design, especially for very large antenna array configurations.
1) Antenna Selection With Channel Prediction: In order to address this issue, one promising approach would be the joint use of channel prediction and AS.It is straightforward that the extra pilot overhead can be reduced if we are able to predict the complete channel states.This approach works mainly because the real propagation environment is often time-varying and temporally correlated due to Doppler effects [20], [21], [22].Based on this, researchers have developed different channel prediction algorithms in recent years [23], [24], [25].In [23], the authors proposed a spatio-temporal autoregressive method for the prediction of the high mobility channel, where the prediction was performed utilizing the temporal correlation in the angle-delay domain.In [24], the authors predicted channel states by exploiting the channel correlations.The proposed method employed the convolutional neural network (CNN) and the long-short-term memory network (LSTM), which allows a multistep prediction of the channels.Similarly in [25], the authors predicted channel states by utilizing the spatio-temporal characteristics of CSI and a combination of CNN and convolutional LSTM.Although showing improved AS performance, these channel prediction methods are based on the historic complete CSI measurements [23], [24], [25].It is noted that obtaining complete CSI is spectral-inefficient in the considered massive MIMO systems, especially when the number of antennas significantly exceeds the number of RF chains.In addition, the aforementioned channel predictions are deterministic methods, which limits the use of channel statistics for performance enhancement.
2) Antenna Selection With Incomplete CSI: Recently, researchers have focused on developing AS algorithms which can directly operate under incomplete CSI [1], [15], [17].In [15], the authors formulated the AS problem as a combinatorial multi-armed bandit problem when only incomplete CSI is available, and proposed an online AS algorithm using Thompson sampling.However, it is assumed therein that each antenna contributes equally to the sum capacity.Since this assumption is not valid in most real propagation environments [18], the solution therein is not robust in practical scenarios.The authors of [17] considered the AS as a partially observable Markov decision process (POMDP) and proposed a myopic policy for selecting antennas under imcomplete CSI.The myopic policy maintains a belief vector for the underlying channel states of each time slot, and updates this belief vector along with the system dynamics.The myopic policy therein, however, was only designed for single-user MIMO systems under general fading channels with a two-state coarse channel quantization.Since practical quality-of-service (QoS) constraints were not considered in [15] and [17], their applicability is limited in practical massive MIMO systems.
Although several approaches have been proposed to address the antenna selection problem under incomplete CSI, further improvements are still needed to improve system performance and robustness.For the practical AS algorithm design, one crucial concern is how to maximize the system performance while reducing the chance of violating the system' practical constraints when only incomplete CSIs are available.Risk-aware solutions using conditional value at risk (CVaR) have recently been proposed for resource management in ultra-reliable and low latency communications (URLLC) and the coexistence of eMBB and URLLC services [26], [27], [28].However, these methods rely on the complete information of the system states, which are not applicable in the considered scenario.

B. Motivations and Contributions
As mentioned above, existing approaches cannot efficiently solve the antenna selection problem in multi-user massive MIMO and incomplete CSI.This motivates us to design a general antenna selection framework that can operate robustly against the complete CSI condition.Additionally, existing AS algorithms lack the capability to recognize the risk of violating the system constraints under incomplete CSI, which is essential to the required QoS.In general, risk awareness should be presented throughout the decision process, which implies that a desired antenna selection should be performed by jointly considering three important factors: practical system constraints, optimization objectives, and the uncertainties introduced by the incomplete CSI.Therefore, risk-aware planning remains a challenging issue for practical AS solutions, which is also the motivation of this work.
In this paper, we propose a joint channel prediction and antenna selection framework (JCPAS) for the antenna selection problem in multi-user massive MIMO under incomplete CSI and practical system constraints.The proposed JCPAS comprises a deep unsupervised learning-based conditional channel estimator and a risk-aware Monte Carlo tree search (RA-MCTS) algorithm.A risky event is identified when one of the system constraints cannot be satisfied.At each frame, the channel estimator maintains a belief distribution by estimating conditional channel statistics from the sequence of the past incomplete CSI measurements and estimates the posterior channel distribution.Based on the estimated posterior channel distribution, the RA-MCTS algorithm evaluates uncertain outcomes of each possible antenna combination through Monte Carlo simulations.In such a risk-aware manner, the chance of violating the system constraints can be reduced, and the corresponding negative consequences can also be mitigated when arisen.Simulation results show that the proposed RA-MCTS algorithm not only cuts the average power consumption by 50%, but also significantly reduces the probability of violating the system constraints by 90%.
To summarize, our main contributions are as follows: • We introduce the JCPAS framework in massive MIMO systems under incomplete CSI.The proposed JCPAS framework does not require complete CSI measurements and is robust to conventional antenna selection algorithms.In addition, the proposed channel prediction method is a probabilistic model that can be used to enhance the performance of the integrated selection algorithm.
• We propose the RA-MCTS algorithm which enables efficient and robust antenna selection in massive MIMO under incomplete CSI and practical system constraints.In contrast to the existing antenna selection algorithm, our proposed RA-MCTS is applicable to diverse optimization objectives and system constraints, and it is able to reduce the chance of violating the practical system constraints by leveraging channel statistics.• We provide a new insight for risk-aware decision making with limited resources and insufficient information.
In particular, a risk-aware system can be built by leveraging the historical incomplete observations to estimate a belief distribution over the underlying system dynamics and planning based on the statistics of the random outcomes introduced by incomplete observations accordingly.The remainder of this paper is organized as follows.
Section II describes the mathematical model of the considered massive MU-MIMO system, as well as the associated antenna selection problem.Section III presents the proposed deep unsupervised learning-based conditional channel estimator.Section IV presents the RA-MCTS algorithm, which is a risk-aware planning algorithm for selecting antennas with incomplete observations.Section V discusses the related simulation results.Finally, Section VI concludes the paper.

II. PRELIMINARIES
In this section, we introduce the system model of the considered massive MIMO system as well as the associated AS problem.After that, we will review the greedy search AS algorithm under the complete CSI assumption.

A. System Model
As shown in Fig. 1, we consider the downlink of a massive MU-MIMO system where a BS serves N u single-antenna users.The BS is equipped with N t transmit antennas and N f (0 < N f ≪ N t ) RF chains.In addition, switches are also available at the BS such that an RF chain can connect with any antenna of interest.The channel between the BS and users is time-varying and temporally correlated, which is common in real propagation environments with Doppler effects [20], [21], [22].Moreover, we assume that the CSI remains unchanged within each frame duration of T channel uses (c.u.), and the considered system operates in TDD mode, meaning that we have identical channels for both uplink and downlink transmission due to channel reciprocity [6], [9].On the downlink transmission, the CSI acquisition is accomplished via uplink pilot-assisted channel measurement, and multi-user precoding is then adopted to mitigate inter-user interference.Under these settings, we further denote τ csi as the number of c.u. consumed to acquire CSI, resulting in T − τ csi c.u. for data transmission.
Since only N f out of N t transmit antennas can be activated at the same time, the BS needs to select the best N f antennas in terms of performance metrics maximization.Let a a a = {a 1 , a 2 , . . ., a j , . . ., a N f } be the set of the indices of the N f selected antennas, and we denote A as the set of all possible antenna combinations with a cardinality of |A| = Nt N f .In addition, let H H H ∈ C Nu×Nt be the complete CSI matrix, and we denote H H H(a a a) as the incomplete (or partial) CSI from the chosen combination a a a ∈ A, meaning that the columns of H H H(a a a) are selected from H H H with respect to the indices in a a a. Besides, we denote x k as the data symbol to be transmitted to user k, and E{|x k | 2 } = 1.Then, the received signal y k (a a a) at user k is given by where h h h k (a a a) ∈ C 1×N f is the channel vector for user k from the antenna combination a a a, w w w k denotes the N f × 1 precoding vector for user k, and n k ∼ CN (0, σ 2 k ) is the additive white Gaussian noise (AWGN) at user k.The second term of ( 1) is the inter-user interference at user k.
Assuming negligible processing time, the effective spectral efficiency for downlink transmission to user k with the selected antennas a a a can be written as where SINR k (a a a) a a a)w w wj (a a a)∥ 2 represents the signal-to-interference-noise ratio (SINR) of user k.Accordingly, the effective sum spectral efficiency with antenna combination a a a can be bounded by C(a a a) = Nu k=1 R k (a a a), and the total power consumption for transmitting data at each frame is given by P (a a a) = Nu k=1 ∥w w w k (a a a)∥ 2 .

B. Antenna Selection With Objective Maximization
For practical scenarios, a typical objective for selecting the best subset of antennas is to optimize a generic objective function F k (a a a) under the constraints of total transmit power and minimum QoS requirements.Mathematically, the optimization problem can be formulated as maximize where η k is the QoS requirement for user k, P tot denotes the total transmit power, and F(a a a) is the objective function of interest.According to the specific problem, the objective function can be F(a a a) = − Nu k=1 ∥w w w k (a a a)∥ 2 when we want to minimize the energy consumption, and F(a a a) = Nu k=1 R k (a a a) if we want to maximize the sum-throughput for the system.
1) Antenna Selection With Complete CSI: Solutions to problem (3) have been well studied in the literature under the complete CSI assumption [10], [11], [12], [13], [14], that it is possible to fully observe the channel states with an affordable overhead.Among these solutions, the idea of greedy search is widely adopted due to its good performance and lowcomplexity [6].Let a a a p be a set of p selected antenna indices, and let a a a q ⊃ a a a p be a superset of a a a p with q > p.In addition, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Algorithm 1 Greedy Search Algorithm
Input: Full channel matrix H H H Output: Selected antenna combination a a a 1: Let the set of all antennas be N N N = {1, . . ., N t }; 2: Initialize the set of removed antennas as r r r = ∅; we denote a a a q−p = a a a q \ a a a p as the difference between set a a a q and a a a p .According to the Proposition 1 of [9], the spectral efficiency loss of removing q − p antennas from a a a q can be bound by with the following notations where tr(•) denotes the matrix trace.Now, the sum-throughput maximization problem can be converted as removing N t − N f antennas with the minimum capacity loss, which is given by a a a * = arg min Unfortunately, the above equation still needs exhaustive search to find the best antenna combination to be removed.Nevertheless, it is possible to reduce the computational complexity by utilizing the concept of greedy search.Then, each time the greedy search suggests to remove the worst antenna that contributes least to the capacity, resulting in that the search space can be reduced significantly while still reserving a good performance.According to Proposition 2 of [9], the antenna to be removed at each iteration is given by where a a a p is currently selected antenna set.By repeating this strategy, the antenna combination which maximizes the sum-throughput will eventually be determined [9], and the pseudocodes of the resulting greedy search algorithm is shown in Algorithm 1.It is worth noting that Algorithm 1 is directly applied to generalized ZF by using A A A = (αI 2) Antenna Selection With Incomplete CSI: Due to the limited number of RF chains at BS, only the partial CSI corresponding to the N f selected antennas could be measured in each pilot transmisison.In this case, we need τ csi = N u ⌈ Nt N f ⌉ c.u. to acquire the complete CSI [15], and obviously, τ csi will quickly become cost-prohibitive for large N t , where extra pilot overhead grows rapidly in massive MIMO and results in a reduced effective transmission rate, as shown in (2).For this reason, acquiring complete CSI is thereby a very inefficient strategy for massive MIMO systems, which motivates us to study the antenna selection problem in the presence of incomplete (or partial) CSI.

III. PROPOSED ANTENNA SELECTION FRAMEWORK
WITH INCOMPLETE CSI In this section, we propose a joint channel prediction and antenna selection (JCPAS) framework operating without relying on a complete CSI assumption.JCPAS consists of two main blocks: Channel tracking and Antenna selection.The first block learns the belief of the complete channel matrix from historical incomplete channel estimates (of selected antennas) and predicts the complete channel matrix.Based on the predicted complete channel matrix, the second block selects the best antenna subset.To illustrate our proposed JCPAS framework, we present its workflow and time horizon structure in Fig. 2 for the ease of understanding.

A. Channel Tracking With Historical Incomplete Observations
In practical scenarios with large-scale antenna configurations, incomplete measurements of the current channel state are expected due to the limited number of RF chains and channel estimation duration.This limitation indicates that the antenna selection will be performed in the presence of incomplete channel information, in which existing complete CSI based antenna selection algorithms cannot be efficiently applied.Therefore, it is of vital importance to track the transition of channel states by exploiting the temporal correlation of real propagation environments, and thereby maintaining an accurate belief on the current channel state to help AS.We note that Kalman filtering based prediction is not applicable since it requires the complete historic CSI measurements From this perspective, we hereby propose a deep conditional generative model (DCGM) to estimate the distribution of the belief state based on the sequence of past incomplete channel measurements.Mathematically, the channel tracking model can be considered as a probabilistic generative model P(H H H t |Φ Φ Φ t ), which is conditioned on the history of incomplete CSI measurements, denoted by , H H H t (a a a t )} with a finite horizon L. In general, the value of L is determined by the storage and computing resources.Note that the missing entries will be replaced by zero entries in the sequence if the current time slot t < L.Although it is hard to estimate the exact posterior distribution P(H H H t |Φ Φ Φ t ), we can still get its accurate approximation by the maximum likelihood approximation.
To this end, we begin by denoting the approximate distribution as Q θ θ θ (H H H t |Φ Φ Φ t ) with parameters θ θ θ.Then, we train the model with the objective of maximizing the likelihood of the training samples on the chosen distribution, which is given by where  achieves its minimum.On the other hand, Q θ θ θ (H H H t |Φ Φ Φ t ) deviates from the real distribution when (11) enlarges.However, prior knowledge on the real distribution is needed to select an appropriate model for approximation, which is unpractical in the circumstances of our interest.
In order to solve this issue, we employ a deep normalizing flow (DNF) to construct Q θ θ θ (H H H t |Φ Φ Φ t ).Compared with other generative models, e.g., variational auto-encoders (VAEs) and generative adversarial networks (GANs), DNF is a fully probabilistic model with tractable exact density inference, which can accelerate the search efficiency of Monte-Carlo tree search [29].Although in this paper the exact density of the predicted channel states is not utilized, it can help reduce the number of simulations in our future work by evaluating the certainty of current solutions.As a kind of generative model, DNF approximates the distribution by the change of latent distribution.This strategy allows us to sample complete observations from the latent space, while still being able to compute the corresponding log-likelihood by the law of change of variables [30], [31].In principle, DNF assumes that complete observation H H H t depends on a latent random variable Z Z Z t following a tractable distribution P ω ω ω (Z Z Z t ), where ω ω ω is the parameters of the latent distribution.It is also assumed that ω ω ω follows a tractable distribution, denoted by ω ω ω ∼ P ψ ψ ψ (ω ω ω) with parameters ψ ψ ψ.Besides, the parameters ψ ψ ψ can be determined based on the history, i.e., ψ ψ ψ = γ θ θ θ1 (Φ Φ Φ t ) in which γ θ θ θ1 (•) is a function represented by a deep neural network (DNN) with parameters θ θ θ 1 .
Intuitively, this approach takes the uncertainty of incomplete observations into channel tracking by considering ω ω ω as a random variable conditioned on the given history Φ Φ Φ t .Therefore, the latent space is also conditioned on incomplete observations, denoted as Z Z Z t ∼ P Z (Z Z Z t |Φ Φ Φ t ).In conclusion, the generative process can be described as where g θ θ θ2 (•) represents an invertible (or bijective) function with parameters θ θ θ 2 .Since g θ θ θ2 (•) is an invertible function, the associated latent variable can be effectively inferred by . By setting θ θ θ = {θ θ θ 1 , θ θ θ 2 }, the log-likelihood of the complete observation H H H t can be approximately computed from where det df dH H H is the determinant of the Jacobian.In order to construct a flexible model Q θ θ θ (H H H t |Φ Φ Φ t ), we assume that the invertible function f (•) is composed by N I invertible subfunctions, given by Based on the above factorization, we can infer the corresponding latent variable Z Z Z accordingly, Now, using the notations V V V 0 ≜ H H H t and V V V Ni ≜ Z Z Z t , we can rewrite (13) as Thus, we construct the approximate model Q θ θ θ (H H H t |Φ Φ Φ t ) with N sub-functions, with each sub-function being a small flow step of the complete flows.In this case, it is straightforward to train the model Q θ θ θ by recalling the principle of maximum likelihood approximation in (13).

B. Network Implementation
The applied network structure of the proposed DCGM is illustrated in Fig. 3.As mentioned in Sec.III-A, it is important to ensure that each flow step is fully invertible for the implementation of the proposed DCGM.In our implementation, we implement the flow steps via normalization layers, invertible convolutional layers and affine coupling layers, where the details of these layers can be found in [1], [30], [32], and [33].Using the above invertible layers, an invertible network can be constructed to track channel transitions from the incomplete history.In particular, the constructed invertible network is composed by N flow steps, and each flow step containing only three layers: activation normalization layer, invertible convolutional layer and affine coupling layer.Because the latent variable Z Z Z t relies on the incomplete history Φ Φ Φ t , a conditional distribution } as the parameters of the distribution.Then, we sample ω ω ω t by the reparameterization steps where ν ν ν 1 and ν ν ν 2 are two standard complex Gaussian samples, ⊙ denotes the element-wise multiplication, and the associated parameters ψ ψ ψ t = {µ µ µ 1 (t), µ µ µ 2 (t), Σ Σ Σ 1 (t), Σ Σ Σ 2 (t)} are determined by the incomplete history, i.e., ψ ψ ψ t = γ θ θ θ1 (Φ Φ Φ t ).Specifically, γ θ θ θ1 (Φ Φ Φ t ) is represented by two independent convolutional networks, which can be expressed as {µ µ µ 1 (t), Σ Σ Σ 1 (t)} = CNN 1 (Φ Φ Φ t ) and {µ µ µ 2 (t), Σ Σ Σ 2 (t)} = CNN 2 (Φ Φ Φ t ).CNN 1 (•) and CNN 2 (•) may both have exactly the same network structure composed by multiple convolutional layers and rectified linear units (ReLU) [30].In order to retain the spatial information, zero-padding is used to keep each incomplete observation H H H t (a a a t ) within Φ Φ Φ t having the same shape of N u × N t .

C. The Proposed Antenna Selection Framework
Given the prediction of complete CSI from the incomplete history, the antenna selection becomes a straightforward task.The proposed JCPAS framework utilizes the proposed channel tracking model Q θ θ θ (H H H t |Φ Φ Φ t ) to select the antennas in practical environments.The workflow of the proposed JCPAS framework is presented in Fig. 2.
At the beginning of each frame, we estimate the belief of the current channel state via the well-trained DCGM, i.e., H H H t ∼ Q θ θ θ (H H H t |Φ Φ Φ t ).It should be noted that we fill in the history by zero entries for initialization.After estimating the belief state, we employ an antenna selection algorithm (e.g., Algorithm 1) to select the antennas subset for acquiring the incomplete CSI as well as data transmission.When the data transmission of the current frame is completed, we update the history for the next frame by pushing the last incomplete observation H H H t (a a a t ) into Φ Φ Φ t .
To summarize the process of our proposed JCPAS framework, the associated pseudo-code is detailed in Algorithm 2.

Algorithm 2 Proposed Joint Channel Prediction and Antenna Selection (JCPAS) Framework
1: Set the current frame index t = 0; 2: Initialize history as Φ Φ Φ t = {0 0 0 Nu×Nt , . . ., 0 0 0 Nu×Nt }; for i = 0 to min(t, L) do 10: 11: It is worth noting that the proposed framework is a general framework with the purpose of reducing the channel estimation overhead for massive MIMO systems.Thus, the choice of antenna selection algorithms is not limited, and hereby we use the greedy search algorithm for illustration.In principle, the choice of antenna selection algorithms should be determined based on the available resources and different needs of environments.
IV. PROPOSED RISK-AWARE PLANING ANTENNA SELECTION ALGORITHM This section will introduce the proposed risk-aware planning algorithm, which can be integrated into the JCPAS framework to further improve system robustness, and meanwhile reducing the chance of violating practical constraints.After that, we will discuss the performance-complexity tradeoff of the proposed algorithm.

A. Risk-Aware Monte Carlo Tree Search
Due to incomplete CSI and imperfect belief estimation Q θ θ θ (H H H t |Φ Φ Φ t ), antenna selection often carries the risk of violating practical constraints of problem (3), as the information is not perfect for making decisions.In principle, the outcome F(a a a t ) of the selected antennas a a a t can be regarded as a random variable conditioned on the history Φ Φ Φ t .This indicates that making decisions based on the inspection of a single sample from the estimated belief distribution is obviously not sufficient and risky.
To be specific, such potential risk results from two parts: the probability of selecting antennas based on a sampled belief deviating from the truth, and the negative consequences if it does.Unfortunately, the existing algorithms depend on the known channel coefficients rather than the channel statistics.To solve this issue, planning based on the expected outcomes should play a crucial role in risk-aware AS algorithm design.This realization is a key to heuristic risk-aware planning in the presence of incomplete observations and to guarantee the constraints.
From this perspective, we propose to select antennas and manage risks (of violating the system constraints) by learning a posterior distribution over the expected outcomes of each selected antennas combination.This approach can be accomplished based on the concepts of MCTS [34] and bootstrap Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Thompson sampling (BTS) [35], [36], [37], [38].As a bestfirst search strategy, MCTS employs a heuristic exploration to iteratively explore the combinatorial search space.In general, the search space can be regarded as a decision tree consisting of decision nodes, and each of which has a number of child nodes, and each child node corresponds to an available decision of removing the associated antenna index.As illustrated in Fig. 4, the typical routine of MCTS includes the following four steps [34]: • Selection: We traverse the search tree in accordance to the estimated statistics of each node until encountering a node that has not been fully expanded, which is also called as a leaf node.• Expansion: Whenever a leaf node is selected, it must be expanded.The expansion is done by randomly generating a child node and then initializing the prior information for the newly generated node.• Simulation: We execute a random rollout through Monte Carlo simulations until a complete selection is reached.
We then simulate the outcome of the explored complete selection by Monte Carlo simulation.• Backpropagation: After receiving the simulated outcome of a complete selection, the results will be backpropagated to all ascendant nodes, in which a set of predefined algorithm statistics should be updated accordingly.
During the search, the above routine will be repeated a number of times such that the combinatorial search space will be incrementally explored and the search tree will be simultaneously expanded.After a number of simulations, the statistics of each node will be sufficient for making decisions.As described in Algorithm 3 and Algorithm 4, we jointly use Q θ θ θ (H H H t |Φ Φ Φ t ) and MCTS to build the proposed RA-MCTS algorithm.The proposed RA-MCTS works in a similar manner as the greedy search, which removes antennas one by one iteratively.Each rollout is simulated based on the sample drawn from the estimated belief distribution Q θ θ θ (H H H t |Φ Φ Φ t ).Besides, we maintain an approximate posterior distribution over the expected outcome of simulations, and utilize BTS to compute policies in risk-aware and multi-constraints settings.
For each simulation, the proposed RA-MCTS always starts at a root node with a belief sample H H H t ∼ Q θ θ θ (H H H t |Φ Φ Φ t ) drawn from the estimated belief distribution.It is worth nothing that the belief sample H H H t is utilized to compute the prior probability of each node.Specifically, we start with the antenna set a a a = {1, 2, . . ., N t } and define A A A = ( H H H t H H H H t ) −1 .Note that we will sometimes omit the time index t since the selection is done within the current frame.Then, we update the antenna r r r = r r r ∪ {m}; 6: a a a t ← {1, 2, . . ., N t } − r r r; 7: return a a a; while within computational budget do 10: n ← RETRIEVENODE(r r r, ∅, A A A, H H H t ); if the node corresponding r r r has not been generated then for m ∈ r r r − r r r ′ do 20:

A A A ← A A A + A A A h h hm h h h
23: j̸ =i e −λ j , ∀i ∈ N N N − r r r;

24:
for each child node i ∈ N N N − r r r do 25: α ij ← α i , ∀j ∈ {1, . . ., J}; 26: return the already generated node; set as a a a = a a a \ {m} whenever an antenna index m is removed.

Similar to greedy search, A can also be successively updated as A A A = A A A + A A A h h hm h h h
. When expanding a new child node i, we compute the corresponding potential spectral loss as In addition, we have j̸ =i e −λ j , which denotes the prior probability of exploring child node i.
By considering each selection as a multi-arm bandit problem, we use the concept of BTS to select child nodes while balancing between exploration and exploitation.The intuition behind BTS is simple and intuitive.The algorithm randomly selects a child node at each step with the probability of being optimal according to current beliefs and, in the meantime, continues to sample all possible child nodes that could plausibly be optimal [35], [37].As more information is collected, beliefs about the expected utility of each node are carefully tracked to balance exploration and exploitation [38].In contrast to the existing Thompson sampling based AS scheme introduced Sample ϵ j ∼ Bernoulli( 12 );

35:
if ϵ j = 1 then 36: 37: in [15], our proposed approach does not rely on the assumption of equal antenna contribution, as well as the Beta-Bernoulli posterior specification.Indeed, these advantages can help improve the robustness under the circumstance of model misspecification.Specifically, we adopt a bootstrap distribution to approximate the posterior distribution over the expected outcome of each node.The bootstrap distribution is parameterized by a number of bootstrap replicates, j ∈ {1, . . ., J}.In the initialization of a new node i, for each bootstrap replicate, j, we store a set of parameters with α ij = α i and β ij = 1 by default, and these parameters will be updated during the backpropagation of each simulation.Specifically, at node i, for each bootstrap replicate j, we update α ij and β ij depending on the result of a coin flip Bernoulli( 12 ).If a coin flip is equal to 1, we update the parameters by where U(a a a) denotes a utility function which evaluates the normalized outcome of a complete antenna selection a a a (i.e., |a a a| = N f ) in the simulation phase.For instance, a utility function in the energy minimization problem is given by which normalizes the risk of violating the QoS requirements in the outcome evaluation.This implies that if it cannot satisfy all the constraints, the BS will try its best to allocate all the available power budgets to improve QoS.To decide which antenna should be removed, the previously computed statistics is utilized.For each child node i, we first uniformly sample j from J bootstrap replicates, and then a node with the largest point estimate is selected by i * = arg max i αij βij , which allows us to break ties randomly.

B. Complexity Analysis
Intuitively, our proposed RA-MCTS degrades to greedy search (see Algorithm 1) if the computational budget of each iteration is restricted to one single simulation.On the other hand, RA-MCTS is capable of intelligently managing risks and allocating exploration efforts with sufficient computational budget, while the greedy search ignores risks and does not actively explore.This implies that our proposed RA-MCTS achieves a trade-off between performance and computational complexity.
In general, the computational complexity of our proposed RA-MCTS depends on two aspects: the number of simulations and the computational cost of each simulation.For the latter, it can be controlled by the number of bootstrap replicates, as the choice of J obviously limits the number of samples we have from the bootstrap distribution.For a smaller J, RA-MCTS is expected to become greedy.This is because the probability of choosing some nodes that do not have the largest point estimate will show a trend of being zero.On the other hand, a larger J will involve more exploration, at an expense of extra computation complexity.
To characterize the computational complexity of our proposed algorithms, we consider the total number of floating-point operations (a.k.a.FLOPs) to quantify the complexity order.Note that we consider a FLOP to be either a complex-valued multiplication or a complex-valued summation.In fact, a complex-valued multiplication requires 4 real-valued multiplications and 2 real-valued summations, whereas a complex-valued summation requires only 2 real summations.However, each operation will be counted as one FLOP for simplicity.Given A A A ∈ C q×p and X X X ∈ C p×r , the arithmetic order of FLOPs for the matrix mutiplication of Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I COMPUTATIONAL COMPLEXITY COMPARISONS OF ANTENNA SELECTION ALGORITHMS WITH ZERO-FORCING PRECODING
For the RA-MCTS algorithm, it computes and stores the result of (H H HH H H H ) −1 only once, which contains one matrix multiplication and one matrix inversion.The complexity order of For each loop of RA-MCTS, despite the fact that it works in a similar manner as greedy search, however, it should be noted that RA-MCTS may explore a path which starts from an intermediate node and has been partially explored before, while greedy search always explore a brand new path starting from the root.This difference implies that for each loop of RA-MCTS, the complexity of greedy search can be regarded as an upper bound in the worst cases, which is given by O(N 2 u (N 2 t − N 2 f )) [9].Since RA-MCTS has to backpropagate the simulation results up to N t − N f ascendant nodes, the computational complexity for the tree policy and rollout procedures is bounded by ) .Besides, RA-MCTS also draws one belief state from the belief distribution at each loop.As to the neural network, the normalizing flow is constructed by three kinds of invertible layers, and their computational complexities are determined by element-wise operations and log-determinants [30], [32], [33].Hence, for a normalizing flow with N I layers, the computational complexity depends on the input size, which is given by O((N I LN t N u ).For a CNN with L Conv layers, we denote the kernel size and the number of kernels at the i-th layer as S ker (i) and N ker (i), respectively.Then, the computational complexity of CNN is given by O( [39].The total computational complexity of drawing belief samples is given by: Consider RA-MCTS(N , J) with N rollouts and J replicas in total, its computational complexity can be bounded by: For comparison, we summarize the computational complexities of some antenna selection algorithms in Table I.It is clear that JCPAS-Basic is a special case of RA-MCTS(N , J) with N = 1 and J = 1.Moreover, JCPAS-Basic works in the same way as greedy search, except that the complete CSI is predicted from the belief distribution.

A. Environment Setup
We perform simulations considering the energy minimization problem in (3), and the utility function ( 20) is adopted to evaluate the system energy efficiency subjected to limited transmit power and QoS constraints.The users are assumed to be randomly located around the BS, and the channels between the BS and users are time-varying.In order to simulate the real propagation environment, we consider temporally correlated fading channels.By the maximum entropy principle, it is common to characterize the channel evolution by the Gaussian-Markov process with a one-step correlation coefficient given by Jakes' model [20], [21], [22].Mathematically, the first-order Gaussian-Markov channel model is described by denotes the temporal correlation coefficient for user k, and ∆ ∆ ∆ t is the innovation process with unit-variance complex Gaussian i.i.d. in time.The value of ζ k is determined by the maximum Doppler frequency and is inversely proportional to the terminal speed [40], in which ζ k = 1 represents a static channel and ζ k = 0 implies that the channel is i.i.d.over time.The fading correlation coefficient can be obtained from Jakes' model given by ζ k = J 0 2π v k fc C T , where J 0 (•) denotes the zeroth order Bessel function of the first kind, v k is the speed of user k, C is the speed of light, and T is the frame duration [40].For example, we present the correlation coefficients of some typical scenarios within a wide range of speeds from 3.6 km/h to 290 km/h in Table II.Note that we consider the following two scenarios in our simulations: • Scenario I, low-mobility users: Each user has a uniformly distributed random fading correlation coefficient as ζ k ∼ Uni(0.997,0.999), which represents pedestrians and runners.• Scenario II, high-mobility users: Each user has a uniformly distributed random fading correlation coefficient as ζ k ∼ Uni(0.932,0.982), which represents the vehicular terminals.In terms of the structure of the neural network, we employ a normalizing flow with N i = 16 flow steps, and each CNN contains 6 layers where the number of convolutional kernels and the kernel size of each layer are {64, 32, 32, 16, 64, 128} and {3, 9, 3, 3, 3, 9}, respectively.Note that the parameters of the neural network are chosen on the basis of experimental experiences."The DNF model is trained based on the principle of maximum likelihood approximation given in ( 14) using Pytorch machine learning framework and Adam optimizer."[17] Other common parameters are as follows: N f = 8, N u = 6, P tot = 20 dBW, L = 128.

B. Competing Algorithms
In order to verify the effectiveness of the proposed framework, we compare the proposed algorithms with various competitive algorithms.The proposed algorithms are summarized below: • JCPAS-Basic: The proposed JCPAS framework introduced in Section III-A, where the greedy search algorithm is used to select antennas.• RA-MCTS(N , J): The proposed RA-MCTS algorithm introduced in Section IV-A, where N is the maximum number of simulations and J is the number of bootstrap replicates.When compared to JCPAS-Basic, the only difference is that the greedy search algorithm is replaced by RA-MCTS in the JCPAS framework.We compare our proposed algorithms with three following benchmark schemes: • Random: This scheme randomly selects antennas for data transmission, which is the most naive solution.
• Online [15]: The online antenna selection algorithm uses Thompson sampling to select antennas with incomplete CSI.• Full CSI: This scheme uses maximum channel estimation overhead to obtain the complete CSI at each frame and then employs the greedy search [9] for selecting antennas.For a fair comparison, zero-forcing based precoding is employed in all schemes.We do not compare with the hybrid beamforming (HB) technique since it also requires complete CSI as Full CSI reference and two schemes achieve comparable performance [41].It is worth noting that the Full CSI scheme spends τ csi = N u ⌈ Nt N f ⌉ c.u. for CSI estimation, while other schemes use τ csi = N u c.u..

C. Performance Comparison and Discussions
Fig. 5 plots the average power consumption of different schemes versus the QoS requirements with N t = 32, N f = 8, N u = 6 and T = 256 c.u..It is observed from Fig. 5 that both the proposed JCPAS-Basic and RA-MCTS algorithms outperform the competing solutions in a wide range of QoS requirements from 5 to 6.6 bps/Hz.Specifically, when η k = 6.2 bps/Hz, ∀k, the JCPAS-Basic algorithm can reduce the power consumption by about 77% and 79% compared with the "Full CSI" and "Online" schemes, respectively; while the RA-MCTS algorithm can reduce these numbers to approximately 80% and 83%.This is because the RA-MCTS selects the antennas based on the estimated posterior over the expected outcome of each antennas combination, while the  JCPAS-Basic selects antennas only based on a single sample from the estimated belief distribution.This difference makes the proposed RA-MCTS more robust under incomplete CSI measurements.
To demonstrate the robustness of the proposed framework, we compare the proposed algorithms with the reference schemes using a new performance metric of the percentage of risky frames in Fig. 6, where the simulation settings are same as in Fig. 5.A frame is considered as risky if any users' QoS requirement is not satisfied.It is clearly shown that the proposed algorithms, JCPAS-Basic and RA-MCTS, significantly reduce the number of risky frames compared Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.with the references.At the spectral efficiency of 6.1 bps/Hz, all three reference schemes have more than 40% of the frames that risky, while the proposed JCPAS-Basic and RA-MCTS algorithms have only 4% and 2% risky frames, respectively.One interesting observation is that the "Full CSI" scheme performs well at medium QoS values η k , but its performance quickly drops for higher η k , to be even worse than the "Online" scheme.This is because the "Full CSI" solution spends a large number of c.u. for channel estimation, hence has the least time for data transmission.As the QoS increases, the "Full CSI" could not satisfy the hight QoS requirements in the limited data transmission time even using the maximum transmit power, which results in high number of risky frames.Whereas, the "Online" scheme does not need to estimate the complete channel states.Even when the transmit power of RA-MCTS approaches the power budget (as shown in Fig. 5 for high η k ), the resulting percentage of risky frames is still much lower than other schemes, which demonstrates the robustness of the proposed risk-ware planning framework.
In order to further verify the effectiveness of our proposed algorithms, we present simulation results for SCENARIO II in Figs.7 and 8 with N t = 64, N f = 8, N u = 4 and T = 128 c.u..Note that we use the same neural network as in Fig. 5.It can be observed from Fig. 8 that in a higher mobility scenario (reduced the frame duration) and with a larger antenna array, the average energy consumption of "Full CSI" scheme is even more than "Random", which indicates that estimating complete CSI in this case is cost-prohibitive.In general, the superior performance of the proposed algorithms are preserved compared with the references.Specifically, at η k = 6.4 bps/Hz, the RA-MCTS scheme can reduce about 60%, 45% and 6% energy consumption compared to the "Full CSI", "Online" and JCPAS-Basic counterparts.In addition, with a very high QoS requirement of η k = 7.6 bps/Hz, the RA-MCTS can reduce 85%, 80% and 8% risky frames compared to "Full CSI", "Online" and JCPAS-Basic.These results further verify the robustness and effectiveness of our proposed RA-MCTS.
In Figs. 9 and 10, we respectively evaluate the power consumption and the percentage of risky frames as a function of the number of users N u , where N t = 128, N f = 16, η k = 6.5 bpz/HZ and the coherence time T = 512 c.u..In general, serving more users requires more transmit power at all schemes, however, the proposed JCPAS-Basic and RA-MCTS algorithms only consume about 50% of the transmit power of other schemes in most cases, as shown in Figs. 9.The robustness of the proposed framework is clearly shown in Figs. 10, in which the proposed RA-MCTS do not have any risky frame for N u ≤ 10, while the percentage of risky frames of three reference schemes grows quickly up to about 80% when N u varies from 6 to 10.When N u = 12, all three reference schemes have all the frames risky, while the proposed JCPAS-Basic and RA-MCTS algorithms impose a percentage of 45% and 25% of risky frames, respectively.This result confirms the robustness of the proposed RA-MCTS in highly-loaded systems with limited resources.
In Figs.11-12, we present simulation results regarding the sum-throughput maximization optimization problem, where N t = 64, N f = 8, N u = 4 and the user's velocity varies from 3.6 km/h to 72 km/h.In addition, the coherent time T = 128 c.u. and each user has a QoS requirement of 5.5 bps/Hz.Noted that if there is no feasible solution to satisfy all the constraints when optimizing the system's sum-throughput, the frame will be marked as "risky frame" and the QoS constraint will be neglected.In other words, we maximize the sum-throughput by neglecting the QoS constraints for risky frames in the simulations.From the two figures, it can be observed that the proposed algorithms outperform the other three reference schemes in terms of not only the sum-throughput but also the chance of violating user's QoS requirement.Specifically, when the power budget grows to 30 watt, JCPAS acheives 3.73 bps/Hz more than the "Online" scheme proposed in [15], while the corresponding sum-throughput obtained by RA-MCTS is about 4.29 bps/Hz.Meanwhile, "Online" scheme has 77.10% of risky frames, while JCPAS and RA-MCTS have about only 18.40% and 5.38% risky frames.On the contrary, the proposed JCPAS and RA-MCTS are much more robust to the scenarios of our interests, as the two schemes have much higher sum-throughputs and impose much lower chance to violate the user's QoS constraint.These results show that the proposed algorithms are robust for both the power minimization problem and the sum-throughput maximization problem.

VI. CONCLUSION
In this paper, we have proposed a robust joint channel prediction and antenna selection framework JCPAS for massive MIMO systems under practical incomplete CSI, which results from the limited number of RF chains, and limited transmit power and QoS requirements.In order to address this problem, we first proposed a deep neural network to estimate the posterior belief distribution of the current channel states.After that, we developed a joint channel tracking and antenna selection algorithm to select the best antennas based on the estimated belief.To improve system robustness, we proposed a risk-aware planning framework, namely RA-MCTS, which employs Monte Carlo tree search and bootstrap Thompson sampling to approximate a posterior distribution over the random objectives.Simulation results showed that the proposed RA-MCTS not only achieves a lower energy consumption but also significantly reduces the risk, quantified as the probability of violating the system constraints.
For the future work, one interesting topic is to further improve the computational efficiency of the proposed RA-MCTS.A feasible solution is to directly estimate the rollout results through a DNN such that the rollout overhead can be significantly reduced.In this case, the posterior distribution over the rollout results is also estimated by DNN, rather than the bootstrap distribution parameterized by the J replicates.The developed framework can be easily applied to cell-free MIMO systems to optimize the number of active access points under limited system resources.Another promising topic is to consider the confidence of the predicted channel coefficient in the selection process.In this case, the best antennas subset should be selected based on not only the channel gain but also the prediction confidence.

ACKNOWLEDGMENT
For the purpose of open access, and in fulfilment of the obligations arising from the grant agreement, the authors have applied a Creative Commons Attribution 4.0 (CC BY 4.0) license to any Author Accepted Manuscript version arising from this submission.

Manuscript received 25
August 2023; revised 30 December 2023; accepted 5 March 2024.Date of publication 22 March 2024; date of current version 12 September 2024.This work was funded in whole, or in part, by the Luxembourg National Research Fund (FNR), grant references FNR/C19/IS/ 13718904/ASWELL and FNR/C22/IS/17220888/RUTINE, and in part by the Australian Research Council under the DECRA project DE210100651.An earlier version of this paper was presented in part at the IEEE Globecom 2022 Workshops [DOI: 10.1109/GCWkshps56602.2022.10008768].The associate editor coordinating the review of this article and approving it for publication was N. Lee.(Corresponding author: Thang X. Vu.)

Fig. 3 .
Fig. 3. Diagram of the structure of the proposed DCGM.

Fig. 4 .
Fig. 4. Workflow of MCTS.Nodes to be visited are highlighted in red, and the search path is highlighted as blue arrows.The simulation and backpropagation paths are illustrated by blue dashed curves.

Algorithm 3 1 :
Risk-Aware Monte Carlo Tree Search (RA-MCTS) Input: History of the past incomplete measurements Φ Φ Φ t Output: Selected antenna combination a a a t Let the set of all antennas be N N N = {1, . . ., N t }; 2: Initialize the set of removed antennas as r r r = ∅; 3: while |r r r| < N t − N f do 4: m = RISKAWAREPLANNING(r r r, Φ Φ Φ t ) 5:

Algorithm 4 2 : 11 : 21 :
Utility Functions of RA-MCTS 1: function TREEPOLICY(n) while node n is not a terminal node do 3: if node n is not fully expanded then for each untried child node i ∈ N N N − r r r do 12: C = C ∪ {i} 13: m ← arg max i∈C α i ; 14: return RETRIEVENODE(r r r ∪ {m}, r r r, A A A, H H H) 15: function BESTCHILD(n) 16: r r r, A A A, H H H ← n 17: for each child node i ∈ N N N − r r r do 18: Sample uniform replicate j ∈ {1, 2, . . ., J}; 19: Retrieve α ij , β ij according to j; 20: m ← arg max i αij βij ; return RETRIEVENODE(r r r ∪ {m}, r r r, A A A, H H H) 22: function ROLLOUT(n) 23: while node n is not a terminal node do 24: n ← BESTCHILD(n); 25: r r r, A A A, H H H ← n 26: a a a ← N N N − r r r 27: return U(a a a) 28: function BACKPROPAGATION(δ, n) 29: while node n is not null do 30: UPDATEDISTRIBUTION(δ, n); 31: n ← parent of n; 32: function UPDATEDISTRIBUTION(δ, i) 33:for j ∈ {1, 2, . . ., J} do 34:

TABLE II TEMPORAL
CORRELATION COEFFICIENTS OF TYPICAL SCENARIOS