Intelligent Massive MIMO Systems for Beyond 5G Networks: An Overview and Future Trends

Machine learning (ML) which is a subset of artificial intelligence is expected to unlock the potential of challenging large-scale problems in conventional massive multiple-input-multiple-output (CM-MIMO) systems. This introduces the concept of intelligent massive MIMO (I-mMIMO) systems. Due to the surge of application of different ML techniques in the enhancement of mMIMO systems for existing and emerging use cases beyond fifth-generation (B5G) networks, this article aims to provide an overview of the different aspects of the I-mMIMO systems. First, the characteristics and challenges of the CM-MIMO have been identified. Secondly, the most recent efforts aimed at applying ML to a different aspect of CM-MIMO systems are presented. Thirdly, the deployment of I-mMIMO and efforts towards standardization are discussed. Lastly, the future trends of I-mMIMO-enabled application systems are presented. The aim of this paper is to assist the readers to understand different ML approaches in CM-MIMO systems, explore some of the advantages and disadvantages, identify some of the open issues, and motivate the readers toward future trends.

developed based on the deployment of very large numbers 23 of antennas at the base station to serve simultaneously many 24 terminals [2]. The research on mMIMO has spanned over 25 The associate editor coordinating the review of this manuscript and approving it for publication was Francisco Rafael Marques Lima . a decade and it is still an active research area as more 26 efforts are being made to meet the stringent requirements of 27 beyond 5G (B5G) networks. The concepts of B5G networks 28 are currently being discussed under different topics such as 29 sixth-generation (6G), 2030 networks, and next-generation 30 [9], [10], [11]. These include network 33 capabilities for new market and industry verticals such as 34 industry 4.0 autonomous applications, media and entertain-35 ment, healthcare systems, virtual reality, augmented real- 36 ity, extended reality, and the education sector [6], [9]. Such 37 new verticals will create massive-scale connectivity with dis-38 parate performance objectives such as ultra-high reliability, 39 2) Less complexity in dealing with non-linear character- 96 istics that are often associated with the PHY layer due 97 to hardware impairments from low-cost components or 98 low-precision analog-to-digital converters (ADCs). 99 3) Overcome the limitations faced in the use of mathe-100 matical models and optimization problems in signal 101 processing, especially in mathematically non-tractable 102 problems. 103 4) Provide intelligence in network decision-making for 104 user-centric services by taking into consideration both 105 the signal processing and network environmental fac- 106 tors, such as channel dynamics, traffic patterns, quality 107 of experience, and network composition. 108 In order to provide an overview of the application of ML in 109 I-mMIMO systems, this article aims to address the follow- 110 ing research questions (RQ) -RQ1: What are deployment 111 methods for I-mMIMO systems? RQ2: What is the current 112 research trend in the application of ML in I-mMIMO? RQ3: 113 What are the challenges and open issues in the adoption of 114 ML in I-mMIMO systems? RQ4: What are the future direc- 115 tions for ML in I-mMIMO systems? 116

117
Recent surveys in the literature have discussed the use of AI 118 in communications systems [1], [14], [22], [23], [24], [25], 119 [26], [27], [28], [29], [30], [ [36]. 120 In [1], Bjornson et al. highlighted five new research directions 121 for mMIMO, among which one of them is the I-mMIMO 122 systems. The authors illustrated how the use of ML can trans-123 form the conventional mMIMO system into an I-mMIMO 124 system. In [14], the authors discussed the advantages that AI 125 would provide for B5G communication networks. The need 126 for ML in mMIMO and some examples of ML applications 127 in mMIMO were discussed in the literature. A survey on the 128 applications of deep reinforcement learning (DRL) in com-129 munications and networking was presented in [24]. Similarly, 130 a survey on the application of DL in physical channel models 131 with emphasis on automatic modulation recognition, channel 132 decoding and detection was presented in [25]. In [22] Zap-133 pone et al., provided a detailed discussion on the application 134 of DL in wireless communication systems by establishing the 135 link between ML and DL, and the application of DL models 136 and mathematical models in wireless networks. A review of 137 DL-based detectors for uplink communication in mMIMO 138 systems was presented in [26] with a detailed discussion of 139 various deep neural networks (DNN). While in [27] a tutorial 140 on DRL as multi-agent learning in cooperative AI-enabled 141 wireless networks was presented. Bhatia et al. [28] presented 142 a short review of DL approaches for mMIMO systems. 143 This includes DL for modulation recognition, beam selec- 144 tion, CE, and antenna selection (AS). A survey from Zhang 145 and Zhu [29] identified six major areas for AI-enabled 6G 146 networks. One of the areas includes advanced radio inter-147 faces such as CE and detection, channel coding, modulation 148 recognition, and end-to-end radio optimization. the non-orthogonal multiple access (NOMA), mMIMO and 151 millimeter wave (mmWave) communication were discussed. 152 The advantages of the DL framework for the direction of 153 arrivals (DoA) estimation and CE issues in mMIMO over 154 other conventional schemes were presented. The authors 155 in [35], [36], and [34] presented a survey on the applications 156 of DL in the different layers of wireless networks, comprising aspects of the I-mMIMO systems. The open issues are iden- 204 tified in Section VII and future directions are discussed in 205 Section VIII. Finally, Section IX concludes the paper. 206

207
The mMIMO is one of the key enabling technologies for 208 5G and B5G networks, where large arrays of antennas are 209 deployed at the BS for spatial multiplexing and high beam-210 forming gain. The advantages of such technology are high 211 spectral and energy efficiency. We discuss the conventional 212 mMIMO under two categories: C-mMIMO and CF-mMIMO. 213 The uplink and downlink transmission for the two types of 214 mMIMO systems are presented in this section. The C-mMIMO are discussed under two types: the co-located 217 mMIMO systems and the distributed antenna system (DAS) 218 mMIMO systems. 219 1) CO-LOCATED mMIMO 220 In the co-located mMIMO system, the BS is equipped with 221 a large number of antennas that collectively serves a num-222 ber of user terminals (UTs) using the same time-frequency 223 resource. The number of BS antenna M is greater than the 224 number of UTs K in the service area. This facilitates the 225 benefit of averaging out small-scale fading, reduced trans-226 mit power, and increased degree of freedom. The co-located 227 mMIMO exploits the phenomenon known as channel harden-228 ing 1 which results in favorable channel propagation between 229 BS and UTs [38]. The illustration of the co-located mMIMO 230 systems is shown in Fig. 1. 231 Two types of transmission modes have been largely 232 explored in the mMIMO systems: TDD and frequency divi- 233 sion duplex (FDD). In the TDD transmission mode, channel 234 reciprocity is assumed between the BS and the UT, where 235 the BSs obtain the downlink channel directly from the uplink 236 channel pilots transmitted by the UTs in the same frequency. 237 The overhead scales with the number of users and it intro-238 duces constraint of coherence time. In the FDD, different 239 1 Channel hardening is an effect where the channel variation decreases and becomes much more deterministic [37]. frequency bands are used for uplink and downlink. In the 240 downlink, the BS transmits the pilots to the UTs. Then, the 241 UTs estimate the channel using these pilots and feedback 242 their CE to the BS. In the uplink, the reverse is performed. 243 This introduces some overheads that scale with the number 244 of BS antennas and the constraint of limited bandwidth. The

Noise
(2) 258 The received signal y jk ∈ C from the downlink transmis-259 sion from the BS in cell j to the UT k is expressed as [38]: where n jk ∼ CN (0, σ 2 DL ) is independent additive receiver 262 noise with σ 2 DL and X l = K l i=l w li ς li is the downlink signal 263 transmitted by the BS in cell l. The received signal y jk can be 264 further expressed as: where w li ∈ C M l is the transmit precoding vector and 269 ς jk ∼ CN (0, p lk ) is the signal sent from the BS j to the UT k. 270 2) DAS mMIMO 271 The DAS mMIMO is another approach to achieving the 272 gains of mMIMO. In DAS mMIMO system, remote antenna 273 VOLUME 10,2022 In the CF-mMIMO, it is assumed that the geographical area 313 is not partitioned into cells but a random number of APs M 314 are distributed in the coverage area which covers K UTs with 315 single antenna where M K .

316
The channel coefficient g mk from the UT k to the AP m is 317 expressed as [56]: 319 where β mk is the large-scale fading factor which accounts for 320 path loss and shadowing effects and h mk is the small-scale 321 fading coefficient and is expressed as h mk ∼ CN (0, I N ) [53]. 322 The channel matrix between the APs and UTs is expressed as 323 G ∈ C M ×K . 324

325
The uplink transmission from the UTs to the APs is classi-326 fied into two types: uplink training and uplink payload data 327 transmission. The uplink training is described as follows. All 328 UTs transmit plot sequence s 1 , · · · , s K ∈ C τ simultaneously 329 and synchronously to all M APs within the service area. The 330 pilot sequence transmitted assigned to all the UTs τ × K 331 are assumed to be orthogonal satisfying s H i s j = δ ij and the 332 transmitted signal from the kth user to the AP is represented 333 as x k = √ q k s k , where E{|s k | 2 } = 1. The received signal 334 sequence at the mth AP is expressed as: 336 where ρ r represents the normalized signal-to-noise ratio 337 (SNR) of each pilot sequence and n m ∼ CN (0, I τ ) is additive 338 noise and τ is the length of pilot sequences.

339
Each AP m obtains an estimate of the channelĝ mk from 340 all the UTs by computing the minimum mean square error 341 (MMSE) estimate of g mk . Theĝ mk is expressed as [56]: where n mk is the noise sequence at the mth antenna and c mk 345 is expressed as: Theĝ mk in (1) is obtained at the APs to determine the 348 receiver coefficients and power allocation which are sent to 349 the CPU.

350
The uplink payload data is described as follows. In the 351 uplink, the K UTs transmit data simultaneously to the 352 APs where the received signal at the mth AP is expressed 353 as: In C-mMIMO or CF-mMIMO systems, channel estimates 426 at the BS are obtained using pilot signals. Due to the 427 limited pilot resources and coherence time, the pilot sig-428 nals are reused in the TDD mMIMO systems which intro-429 duce the phenomenon known as pilot contamination [62]. 430 Although [63] has shown that pilot contamination does not 431 limit the mMIMO systems, methods to mitigate the effect of 432 pilot contamination are still required to improve the network 433 performance of mMIMO systems [64], [65]. For instance, 434 the use of DL-aided channel estimation was proposed in [65] 435 to reduce the influence of pilot contamination. Furthermore, 436 pilot contamination attack is considered another issue and 437 has been investigated in [66] and [67]. Hence, the use of 438 ML techniques has been considered a useful method in the 439 detection of spoofing and eavesdropper attacks in mMIMO 440 systems. 441 VOLUME 10, 2022 The application of mMIMO in some novel use cases in B5G 498 communication systems is discussed in this section. The chal-499 lenges of CM-MIMO are also highlighted.

501
There is now a paradigm shift from the traditional approach 502 that focuses only on programming just the transmitter and 503 receiver platforms to the programming of the radio propaga-504 tion environment. This involves the use of ML for optimizing 505 array aperture by providing high spatial beam resolution to 506 locations with high UT density and lower resolutions to lower 507 UT density. In addition, smartly control metasurfaces based 508 on array position and geometry to overcome the challenges of 509 blockage in order to achieve an ultra-reliable network. This 510 has opened up research into intelligent radio environments 511 (IRE) where AI is used for the control, programming, and 512 optimization of wireless networks. In addition, the ability to 513 apply ML in CSI combined with other information such as 514 UT locations, and UT ID can be used to predict downlink 515  The ML techniques can be categorized into supervised 560 learning, unsupervised learning, and reinforcement learning. the latter is referred to as a label. Using the labeled training 569 set, the supervised learning algorithm aims to provide the 570 desired result. Supervised learning problems can be generally 571 classified into two classes: regression problems and classifi-572 cation problems. In the former, the algorithm aims to provide 573 a continuous valued output whereas in the latter the algorithm 574 aims to provide a label or discrete-valued output.

575
A supervised learning framework can be mathematically 576 described as follows:  In general, there are many ways to model h but the best 596 model depends on the problem, training set, and design con-597 cerns. For example, h can be defined by using the following 598 iterative functions: where g l (·) is an element-wise non-linear activation function, 601 W [l] is a weight matrix and b [l] is a bias vector. In (13), 602 a [0] =x is the input vector and a [L] =ỹ is the output 603 vector. The above iterative functions are referred to as the 604 fully-connected deep neural networks (DNN). Nevertheless, 605 researchers are exploring different types of neural networks 606 such as recurrent neural networks (RNN), convolution neural 607 network (CNN), and graph neural network (GNN) as well as 608 their combinations.

609
In addition, the weight matrix and biased vector, which are 610 referred to as the parameters, determine the mapping function 611 and the supervised learning's efficiency. To obtain the best 612 parameters, cost function J and loss function L are employed. 613 The cost function J indicates the overall performance of the 614 mapping function based on the given training set M while the 615 loss function L measures the error of a single training sample. 616 For instance, the cost function is modeled as follows:  using the square error as follows: while the loss function in the classification problem can be 623 modeled using a binary cross entropy as follows: Other loss functions can also be adopted. Using the cost func-  and other similar applications.

646
In this paper, we discuss two popular algorithms analysis (PCA). The K-means algorithm is a centroid-based 649 model which is useful to autonomously categorize the input 650 vector into K number of clusters. To cluster the input effi-651 ciently, the cost function can be formulated as follows: Assign c (i) using (18) 7: For k = 1 to K 8: Compute ν (k) using (19) be the user coordinates and the hyperparameter could be the 663 user groups. Then, the training samples are associated with 664 one of these centroids as follows:

666
When all the samples have been associated to a centroid, the 667 algorithm then computes the vector of the new centroids as 668 follows:

686
Using singular value decomposition (SVD), the covariance 687 matrix can be rewritten as follows:

689
where V is a unitary matrix and Λ is a diagonal matrix whose

692
We can then obtain the compressed CSI as follows:

694
Similarly, Z can be used to reconstruct the approximated CSI 695 in high dimension with some loss of information as follows: 697 FIGURE 6. Reinforcement learning framework.

698
Different from supervised and unsupervised learning, RL is 699 not instructed on which actions to take. Instead, it allows 700 its algorithm to interact with the environment via feedback. 701 By performing a series of actions and continually leveraging 702 the interaction, the algorithm learns over time which action 703 actually maximizes its reward. Thus, RL is suitable for effi-704 cient power control and resource allocation in mMIMO.

705
As depicted in Fig. 6, RL consists of two major entities: 706 the agent and the environment. The agent is responsible for 707 taking appropriate actions and learning from the environment 708 over time so as to maximize its cumulative reward. There are 709 three important elements in a reinforcement learning system, 710 namely states, actions, and rewards. 3) a set of set of transition probabilities, P, where 717 P s t+1 |s t , a t is the probability that action a t and 718 s t leads to s t+1 , 719 4) a set of rewards, R, where E R t |s t = s, a t = a is 720 the expected reward when the state is s t and the action 721 is a t .

722
Each experience at time t can be represented by a tuple 723 e t = s t , a t , R t , s t+1 . For example, in mMIMO sys-724 tem, the state can be the signal-to-interference-and-noise-725 ratio (SINR), the action can be the power allocation, and the 726 reward can be the overall spectral efficiency.

727
The discounted total long-term reward can be further for-728 mulated as follows: where 0 < γ < 1 is the discount rate and M is the final 731 time step whose value might be finite (e.g., episodic tasks) or 732 infinite (e.g., continuing task).

733
In reinforcement learning, there are two important func-734 tions: state-value function and action-value function. The 735 state-value function estimates the long-term reward where the 736 agent is in state s and follows a policy π. The state-value 737 function is defined as follows: The action-value function (also known as Q-function) is a 740 function that estimates the long-term reward where the agent 741 is in state s and executes an action a. More formally, the 742 Q-function is defined as follows:

744
Having these functions, the agent requires a policy π (s, a) to 745 infer the action at the state s. This policy can be described as 746 follows:

748
The objective of reinforcement learning is to solve the 749 MDP problem. The MDP problem is equivalent to finding the 750 optimal policy that maximizes the long-term reward which 751 can be formulated as follows:

753
To optimize the state-value function, the principle of optimal-754 ity states that the value of a state under an optimal policy π 755 must be equal to the expected return for the best action from 756 that state. Following this principle, the optimal state-value 757 can be rewritten as follows:

780
Meanwhile, the gradient of J can be derived as follows: Using (30) and (31), the DNN providesQ (s, a, w) for all 784 possible (s, a) pairs. The DQN then chooses the action a 785 based on an epsilon-greedy policy that provides the high-786 est Q-function. Finally, the environment provides the actual 787 reward and actual next state, which is further utilized in the 788 DNN. The algorithm then repeatedly trains, approximates, 789 makes decisions, and observes the state to ultimately maxi-790 mize its long-term reward.

791
DQN is a powerful method, however, it is still not a suitable 792 approach if the environment is highly stochastic or when the 793 action space is large or continuous. To address this drawback, 794 an alternative is to employ the DDPG. DDPG consists of four 795 networks: an actor network, a critic network, a target actor 796 network, and a target critic network.

797
To achieve exploration, the policy of DDPG can be defined 798 by adding noise as follows: where s is the input of the actor network, w µ is the weight of 801 the actor network, µ s, w µ is the output of the actor network 802 (which is an action), and φ is the exploration noise. The actor 803 network is trained by maximizing the state-value function and 804 the cost function can be formulated as follows: where w c is the weight of the critic network. Under the 807 assumption that the state-value function is differentiable, the 808 parameters of the actor network can be updated via gradi-809 ent descent. In particular, the gradient can be computed as 810 follows: The output of the actor network is used as the input of the 813 critic network and w µ is updated by maximizing the output of 814 the critic network and fixing the weight of the critic network. 815 A target value for the state-value function can be obtained as 816 follows: where Q target s t+1 , a target , w c target is the output of the target 819 critic network, w c target is the weight of the target critic network, 820 a target = µ target s t+1 , w µ target is the output of the target 821 actor network, and w µ target is the weight of the target actor 822 network. Note that the output of the target actor network is 823 similarly used as the input of the target critic network. The 824 critic network can then be updated by minimizing the loss 825 function as follows: Meanwhile, weights of the target actor and target critic net-828 works are updated as follows: where τ is the soft updating parameter. In addition, each 832 experience e t is stored in the replay buffer where fixed num-833 ber of experience is randomly selected for network updates.

834
For an example, interested reader may refer to [95] where will send their local updates to the server and the server will 855 apply these updates to its global state and repeat the process 856 until convergence.

857
Mathematically, the cost function in a federated learning 858 framework (e.g., a supervised learning problem) can be for-859 mulated as follows: updates its parameters,w k , as follows: Output:w k 13: Perform ML task and compute the lost function L k (w) 14: Update the parametersw k as follows: In more advanced cases, federated learning can be further 879 extended to vertical federated learning, horizontal federated 880 learning, and federated transfer learning. In vertical feder-881 ated learning, the overall training set might share the same 882 client space but differ in feature space. In horizontal federated 883 learning, the training set might share the same feature space 884 but a different client space. In addition, the overall training 885 set in federated transfer learning might have distinctions in 886 both client space and feature space. To address this gap, 887 a common representation between the two spaces must be 888 learned by using the common training samples and applied 889 prediction to samples with a single-sided space. From the 890 above, it is clear that federated learning is ideal for prob-891 lems that require personalized data from devices, data that 892 is private or massive in size, and labels that can be directly 893 obtained from the clients. Nevertheless, due to the decen-894 tralized nature, FL is also challenged by highly dynamic 895 data, correlated data, security issues, inference attacks, fail-896 ures, and unresponsive updates [97], [98], [99]. Moreover, 897 the cost of communications dominates the cost of computa-898 tions in FL and this motivates the applications in I-mMIMO 899 systems [100], [101], [102], [103], [104], [105]. The use of 900 FL was shown to reduce the transmission overhead in [106] 901 and [103], improved the channel estimation performance, and 902 faster computation in [100] and [101]. In addition to different categories of ML techniques, the train-905 ing approach is another important aspect that needs to be 906 considered. The ML training approach in I-mMIMO can be 907 classified into offline, online [107], [108], [109], [110], [111] 908 and offline-online [112]. In the offline training approach, 909 training of the ML model is based on historical data, while in 910 the online, real-time data are explored for both training and 911 decision making. The offline-online is a two-step approach 912 VOLUME 10, 2022 approach or a data-driven DL approach [113].

917
The data-driven approach is based on the conventional use

936
A few research works have also considered the application 937 of ML in end-to-end communication systems [114], [115].

938
In end-to-end communication, suitable ML models and sys-939 tem requirements that can guarantee acceptable I-mMIMO 940 system performance need to be considered.  The methods for dataset generation include the use of simula-968 tion and measurement campaigns. In the simulated method, 969 data can be generated by applying ray-tracing methods for 970 channel realization and the optimization problem is solved 971 computationally. The results from the computation are used 972 as label data in a supervised learning application.  The conventional methods such as geometry-based and 1066 fingerprint-based positioning methods have exploited prop-1067 agation parameters such as channel impulse response, prop-1068 agation delay, AoA, or RSS to localize UTs. However, 1069 these methods are prone to positioning performance degra-1070 dation due to non-line-of-sight (NLoS) propagation, and 1071 higher computational complexity. To address these chal-1072 lenges, several ML approaches have been investigated 1073 in the literature [ [139] to learn a function that 1075 maps the input propagation parameters to the UT position. 1076 The fingerprint-based positioning method using CNN in a 1077 co-located mMIMO system was presented in [13]. A frame-1078 work that jointly extracts and processes channel information  study from [140] shows that the number of BS antennas can 1120 influence the localization accuracy, especially for BS anten-  Table 3.   [148], [149]. In [149] and [143], The application of ML has been explored for intelligent 1160 resource allocation in order to improve spectral efficiency 1161 for the B5G networks [150], [151]. The use of ML has been 1162 shown to overcome the computational complexity related 1163 to the model-based approach in CM-MIMO systems [57]. 1164 The optimal solution for sum-rate maximization has been 1165 obtained by using DL in the scheduling of UTs and link 1166 resources [57], [151], [152], [153], [154]. Computational 1167 complexity reduction has been achieved by exploiting geo-1168 graphical location instead of CSI [151]. Furthermore, ML has 1169 been proposed for an optimal pilot allocation scheme in 1170 order to reduce the computational complexity associated with 1171 heuristic algorithms and optimization theory in mMIMO sys-1172 tems [155], [156] and joint scheduling with DL-based beam-1173 forming in [157]. The use of ML is being employed to solve the challenges 1176 faced in the procurement of optimal antenna subsets, espe-1177 cially in mMIMO systems. The aim of optimal selection 1178 of antenna is to reduce the computational complexity while 1179 maintaining a good signal-to-noise ratio (SNR) and max-1180 imizing channel capacity. ML has been proposed for AS 1181 as a decision-making or classification problem in place of 1182 optimization and greedy search for subarray selection. Stud-1183 ies on AS using ML have been presented in [158], [159]  application of ML for spectral and energy efficiency in the 1214 I-mMIMO systems is presented in Table 4.   Table 5. 1250 To address the limitation of the conventional methods 1251 i.e. the likelihood and feature-based methods identified 1252 in Table 5 The use of low-bit ADCs in mMIMO systems is advocated 1271 due to the low hardware complexity (i.e eliminates the need 1272 for an automatic gain controller) and reduction in power con-1273 sumption from continuous-amplitude sampling and quantiza-1274 tion [ [183]. 1286 Hardware complexity can also be reduced by limiting 1287 the number of RF chains in mMIMO systems via a hybrid 1288 architecture that employs hybrid beamforming (HBF) and 1289 VOLUME 10, 2022   [184], [185], [186], 1295 [187], [188], [189], [190], [191], [192], [193], [194], [195], 1296 [196], [197], [198], [199], [200], and [201]. ML techniques 1297 have been explored to address the complexity of predicting 1298 beamforming vectors and improving the robustness in hybrid 1299 precoders by taking account of imperfect channel matrices 1300 in [185], [186], [188], [190], [192], [193], [195], [197], [199], 1301 TABLE 5. Classification modulation methods. FIGURE 8. Improved spectral efficiency performance in ML-applied hybrid precoding [200]. and [201]. In addition, the ML approach has been applied to 1302 overcome the challenges of channel estimations and reduc-1303 tion of signal overhead feedbacks in [189], [191], [196], 1304 [198], and [200]. Other related works have also focused on 1305 applications of ML in HBC [186], [193]. Results from these 1306 works have shown an increase in spectral efficiency by lever-1307 aging the ability of ML techniques to learn the statistical 1308 structures in partial CSI feedback [199], [201], and exploit-1309 ing the temporal correlation in time-varying channels [200]. 1310 An example of improved spectral efficiency performance in 1311 ML-applied hybrid precoding is shown in Fig. 8 [200]. The DeepCMC which exploits the correlation among the CSI 1322 matrices of nearby users is also proposed in [202] to further reduce the overhead. Furthermore, a model-driven DL frame-1324 work is proposed in [203] for both uplink and downlink com-1325 munications. In addition, DL can also be integrated into the 1326 encoder/decoder for detection problems. For instance, [204] 1327 proposes a DLNet decoder to minimize the bit error rate. 1328 In [205], a concrete map detection (CMD) is proposed, which 1329 relaxes the probability mass function of the discrete random 1330 variable into a probability density function in a maximum 1331 a posterior detection problem. To further improve detection 1332 accuracy while limiting complexity, they also unfold the gra-1333 dient descent algorithm into a DL-based model known as 1334 CMDNet. Moreover, a decoder that deals with a varying num-1335 ber of transmitters and is invariant to the order in which the 1336 users interact with the system can also be designed using 1337 DL [206].

1427
To address this, supervised ML techniques such as k-NN, 1428 DNN, and SVM are used to predict the precoder [192], [199], 1429 [269] and precoder indicators such as AoA, [190] In summary, this literature review shows that ML has been 1447 investigated and explored in various aspects of the mMIMO 1448 systems. This includes different scenarios involving the oper-1449 ations and management of the PHY. It is interesting to note 1450 that a good number of research works have focused on the 1451 CE, especially in the FDD systems in an attempt to overcome 1452 the constraints of limited bandwidth and feedback. This is 1453 due to the important role CE plays in mMIMO systems. It is 1454 also important to note that traditional approaches which are 1455 optimal and near-optimal CE solutions with acceptable per-1456 formance exist for the mMIMO systems but are faced with 1457 computation complexities and complications associated with 1458 matrix inversions in mMIMO systems. This has motivated 1459 several research works to explore the use of ML in CE for 1460 mMIMO systems [275]. The theoretical studies so far have 1461 shown promising results in the reduction of computational 1462 cost and achieving good performance when compared with 1463 the traditional approach.

1464
The application of ML has also been considered in other 1465 use case scenarios such as UT localization, mMIMO aided 1466 IRS, spectral and energy efficiency, modulation, resource 1467 allocation, low-bit ADC systems, decoder, modulation detec-1468 tion, and beamforming. The majority of these works have 1469 considered the block-to-block approach while very few have 1470 considered an end-to-end approach. In addition, since the 1471 application of ML in communication systems is still an 1472 emerging area, most of the works reviewed are theoretical 1473 concepts and laboratory works. Although, efforts are being 1474 carried out to validate the use of ML in I-mMIMO sys-1475 tems [87]. Based on the existing literature, we find that some 1476 research works employ a data-driven approach while oth-1477 ers are using the model-based approach to assist the per-1478 formance of the data-driven approach. Thus, in our opinion, 1479 the model-based approach is likely to remain useful for the 1480 time being but the current research interest is indeed shifting 1481 towards the data-driven approach. As the journey to actual 1482 implementations and deployment of a data-driven approach 1483 may take a few years, several theoretical and practical ques-1484 tions need to be answered. These questions involve the com-1485 plications associated with the design and training phases, the 1486 need for a framework that provides a generalized protocol 1487 that is service-oriented and user-centric, and practical online implementations of ML approaches. The shift from a model-based mMIMO communication 1519 paradigm to data-driven communication using AI opens up 1520 issues of security both at the PHY and network layer. Some of 1521 the security issues have been explored in [277], [278], [279],

1522
[280], and [281]. In [278], the effect of adversarial attack and 1523 jamming attack on CSI feedback in DL-based mMIMO were 1524 investigated, while in [279] and [280] spoofing attack and 1525 eavesdropper's attacks in [281] were investigated on the PHY.  testing data. This can be difficult to implement in practical 1539 mMIMO systems. Another issue to consider is whether to 1540 implement online-training [107], [108], [109], [110] or off-1541 line training. The use of offline training models for real-time 1542 applications is prone to errors and inaccurate predictions. 1543 Hence, online training is advocated. For instance, transfer 1544 learning [223], [282], [283], [284], and adaptive learning 1545 methods are considered for online training in order to meet 1546 the time constraints and reduce training computational com-1547 plexity and the ability to adapt to changes in the learning envi-1548 ronment. The use of data from UTs for training purposes also 1549 raises the issue of privacy. To address this issue, the use of FL 1550 has been considered in [96], [101], [102], [103], [104], [105], 1551 and [285]. Although FL helps to solve privacy and band-1552 width issues the challenge of reconstructing the local gradient 1553 vectors accurately at the central processing unit needs to be 1554 tackled [109]. Furthermore, UTs need to be equipped with the 1555 capability to handle ML algorithms efficiently with optimum 1556 power consumption. Another issue encountered in the use of 1557 ML is the methods and effect of handling missing data and 1558 sparse recovery [286] and the need to find the hyperparameter 1559 (number of layers, neurons) of the ANN. The use of deep 1560 unfolding is exploited to determine the hyperparameter of 1561 ANN for wireless communications via the use of iterative 1562 signal processing algorithms [287], [288].

1564
In this section, the RQ4 is addressed by pointing out the future 1565 direction we envisage in I-mMIMO systems.

1567
The I-mMIMO is currently being used to enable several tech-1568 nological applications in order to meet the unprecedented 1569 requirements of the B5G. Some of the technological appli-1570 cations are discussed as follows: The use of mMIMO VLC system is considered a promising 1573 approach to improving communication capacity and spec-1574 trum efficiency [289], [290]. However, achieving an accurate 1575 CE in a large channel matrix where the communication link 1576 with LoS is dominant in VLC remains a challenge. Hence, 1577 the use of ML I-mMIMO systems is currently being explored 1578 to overcome this challenge [289], [290]. The application of mMIMO has been extended in high-speed 1581 mobile bandwidth deployments such as fast-moving 1582 autonomous vehicles [291], [292], [293], [294], vehicle-1583 to-vehicle communication [295], and unmanned aerial 1584 vehicles (UAV) [187], [296], [297]. The use of I-mMIMO is 1585 expected to support the requirements of vehicle-to-everything 1586 (V2X) [298]. While the CM-MIMO systems seem promis-1587 ing in these areas, there are several issues that necessitate 1588 the application of I-mMIMO systems. For instance, the use 1589 of DL DoA was proposed to overcome the limitations of 1590 subspace and sparsity DoA estimation using mMIMO for 1591 VOLUME 10, 2022 autonomous vehicles [291]. On the other hand, the use of navigation challenges faced by UAVs by enhancing coverage and convergence [297]. Similarly, the ML techniques have 1595 been explored in [296] to increase energy efficiency in a 1596 hybrid precoding UAV-based mmWave mMIMO system. The 1597 use of ML aided mMIMO has been explored to predict link 1598 quality for vehicle-to-vehicle communication using the CSI 1599 between the BS and the vehicle-to-infrastructure in [295].  An important area that needs to be explored in I-mMIMO is 1682 the application of ML to reduce the information communica-1683 tion technology (ICT) carbon footprint. The carbon footprint 1684 is described as the life cycle carbon equivalent emissions and 1685 effects that are related to a product or service [316]. The ICT 1686 carbon footprint can be categorized into embodied (extrac-1687 tion of raw materials, manufacturing, transport, and end of 1688 life) and operational impacts [317]. Due to the increasing 1689 effect of climate change and global warming [316], efforts 1690 need to be directed towards reducing the amount of carbon 1691 generated by the large deployment of mMIMO systems. This 1692 includes AI-driven techniques for reduction of power con-1693 sumption [318] and tracking of carbon footprint for mMIMO 1694 deployments. Furthermore, the use of wireless power trans-1695 fer/energy harvesting using I-mMIMO to enable zero energy 1696 devices (ZED) envisioned for 6G opens is expected to attract 1697 the attention of researchers [106]. More research works are 1698 needed for new I-mMIMO protocols that optimize the power 1699 consumption of the ZED. of ML for self-healing in the I-mMIMO systems [322], [323] 1742 and self-tuning beamforming [324] I-mMIMO systems. For practical implementation of I-mMIMO systems, new 1756 ML techniques that are structured for wireless communi-1757 cation will continue to attract a great deal of interest. The 1758 use of adaptive learning techniques is expected to replace 1759 the traditional ML techniques where static data are divided 1760 into training and testing. Adaptive learning will enable the 1761 I-mMIMO systems to continuously adapt to change in the 1762 wireless environment with optimal performance. In addition, 1763 the cross-fertilization between data-driven and model-driven 1764 using mathematical model approaches for specific I-mMIMO 1765 tasks is expected to draw research interest. This will pave 1766 the way for hybrid I-mMIMO systems. More research works 1767 in exploring the application of FL for collaboration between 1768 I-mMIMO BS and UTs in heterogeneous networks and ser-1769 vice demands are expected. Due to the challenges of hyper-1770 parameter tuning in the application of ML in I-mMIMO 1771 more research works are required in the exploitation of deep 1772 unfolding.

1774
The integration of RF and sensing capability enables the 1775 network to detect the presence of objects and some of the 1776 object's attributes using radars. This is known as joint com-1777 munication and sensing (JCAS) [326], [327], [328]. The 1778 use of sensing capabilities in mobile networks provides 1779 opportunities for several use cases such as object detec-1780 tion and collision avoidance in vehicular networks. Research 1781 in vision-aided wireless communications is an emerging 1782 area [329], [330]. This combines wireless data and vision 1783 data in order to overcome blockage, assist in the prediction of 1784 mMIMO channel subspace, enhance hand-over mechanism, 1785 enable context-aware communication, and provide proactive 1786 network management [329], [330], [331]. Examples of appli-1787 cation of ML techniques in vision-aided wireless communi-1788 cation have been explored in [329], [331], [332], and [330] 1789 with potential benefits. However, the deployment of JCAS 1790 opens up new challenges such as the need for signal process-1791 ing for the detection of the presence and shape of objects, 1792 interference, optimal overheads, and enhanced protocols for 1793 different radar requirements [326]. In addition, the creation of 1794 an effective framework that captures scenario-dependent data 1795 and system configurations for vision-aided wireless commu-1796 nication is a promising area [329], [332].

1798
Next generation multiple access schemes such as power-1799 domain non-orthogonal multiple access (NOMA), code 1800 domain NOMA, and rate splitting multiple access (RSMA) 1801 are promising techniques that can be used in overloaded 1802 mMIMO systems with a much higher number of users if 1803 compared to available channel resource and spatial degrees 1804 VOLUME 10, 2022 of freedom [333]. NOMA not only enables multiple users to 1805 share the same orthogonal channel resource but also offers 1806 a spectral efficient way to multiplex users with different 1807 channel qualities and diverse quality of service requirements. 1808 Existing studies show that [334], [335], [336]