Wi-Fi Meets ML: A Survey on Improving IEEE 802.11 Performance With Machine Learning

Wireless local area networks (WLANs) empowered by IEEE 802.11 (Wi-Fi) hold a dominant position in providing Internet access thanks to their freedom of deployment and configuration as well as the existence of affordable and highly interoperable devices. The Wi-Fi community is currently deploying Wi-Fi 6 and developing Wi-Fi 7, which will bring higher data rates, better multi-user and multi-AP support, and, most importantly, improved configuration flexibility. These technical innovations, including the plethora of configuration parameters, are making next-generation WLANs exceedingly complex as the dependencies between parameters and their joint optimization usually have a non-linear impact on network performance. The complexity is further increased in the case of dense deployments and coexistence in shared bands. While classical optimization approaches fail in such conditions, machine learning (ML) is able to handle complexity. Much research has been published on using ML to improve Wi-Fi performance and solutions are slowly being adopted in existing deployments. In this survey, we adopt a structured approach to describe the various Wi-Fi areas where ML is applied. To this end, we analyze over 250 papers in the field, providing readers with an overview of the main trends. Based on this review, we identify specific open challenges and provide general future research directions.


I. INTRODUCTION
Wireless local area networks (WLANs), standardized in IEEE 802. 11 and commercialized as Wi-Fi, hold a dominant position in providing wireless Internet access. Cisco's Visual Networking Index Forecast estimates Wi-Fi's share of Internet traffic to be 51% in 2022 [1]. Wi-Fi 6 [2]- [4] has become state of the art for all new consumer products and Wi-Fi 7 [5]- [7] is already under development. There are several reasons for S. Szott  For the purpose of Open Access, the author has applied a CC-BY public copyright licence to any Author Accepted Manuscript (AAM) version arising from this submission. This work was supported in part by the National Science Centre, Poland (DEC-2020/39/I/ST7/01457). This work was supported in part by the Federal Ministry of Education and Research (BMBF, Germany) project OTB-5G+ under grant 16KIS0985 and the 6G Research and Innovation Cluster 6G-RIC under grant 16KISK020K, as well as by the project ML4WIFI funded by the German Research Foundation (DFG) under grant number DR 639/28-1. This work was supported in part by grants WINDMAL PGC2018-099959-B-I00 (MCIU/AEI/FEDER,UE) and SGR017-1188 (AGAUR). the popularity of Wi-Fi: well-defined use cases, freedom of deployment and configuration (thanks to operating in unlicensed bands), and the existence of inexpensive in manufacturing and highly interoperable devices.
The 802.11 protocol family has received, in recent years, regular updates leading to performance improvements and new features. These technical innovations provide a challenge: the next generations of Wi-Fi are becoming exceedingly complex. Specifically, each new mechanism, designed to improve network performance, comes with a plethora of parameters that have to be configured. Additionally, there are new application requirements: Wi-Fi is no longer limited to broadband Internet access but is also being used in other situations, e.g., ultra-low latency communication for machine-to-machine communication. This multi-modal operation needs to be supported through a proper configuration, which in most cases is left out of the standard. For example, depending on the combination of resource unit (RU) assignment in 802.11ax, the network throughput may vary by more than 100% [2]. In most cases, multiple parameters have to be jointly optimized. This task is non-trivial as the dependencies between parameters and their joint optimization have a highly non-linear impact on network performance. For example, Wilhelmi et al. [8] show that the performance of overlapping 802. 11 Wi-Fi networks does not depend linearly on sensitivity and transmission power settings. The level of complexity is further increased in the case of coexisting network technologies.
Up till now, the goal of the mainline 802.11 amendments was to provide high throughput (802.11n, 802.11ac) and efficiency in dense environments through deterministic channel access (802.11ax). However, future Wi-Fi generations are anticipated to accommodate ultra-low latency and ultra-high reliability traffic (802.11be). Hence, the proper and timely update of the transmission settings is of key importance. Meanwhile, finding adequate configurations in an enormous search space using traditional algorithms is too time and computation resource consuming. Additionally, new WLAN mechanisms also bring overhead in terms of additional measurements which are needed to provide input to their respective control algorithms. In the past, with only a few possible modulation and coding scheme (MCS) values (i.e., in early versions of 802.11), it was possible to quickly test all of them and select the best one. Currently, such a selection is practically impossible.

A. Need for ML in Wi-Fi
The increasing Wi-Fi complexity coupled with uncoordinated deployment, distributed management, and network densification may negatively impact the operation of future 802.11 networks. A candidate approach to solving these performance-related problems is to apply machine learning (ML), a type of artificial intelligence, where "algorithms can learn from training data without being explicitly programmed" [9]. Indeed, the IEEE 802. 11 Working Group is discussing the use of ML for improving the performance of beyond-802.11be networks [10]. So far, ML-based techniques have been explored for a variety of problems in networking [11]- [13]. Successful solutions are applied to fields ranging from configuring physical layer parameters to traffic prediction.
Recently, Kulin et al. [12] published a survey on applying ML for general wireless networking while Zhang et al. [14] reviewed almost 600 research papers on ML in 5G systems. However, neither these nor other recent surveys (reviewed in Section II) describe in detail the Wi-Fi performance improvement with ML from different angles. Wi-Fi is simply too complicated to be covered inside a general survey and requires a dedicated one.

B. Methodology
For the presented survey on improving Wi-Fi performance with machine learning, we started with a systematic literature review methodology [15]. First, we searched for Wi-Fi, 802.11, and WLAN as well as machine learning in the paper abstracts in the following databases: IEEE Xplore, ACM, Elsevier, Wiley, and MDPI 1 . This yielded 1189 papers, out of which we had to remove out-of-scope papers. Next, we added papers manually, usually found through cross-citation analysis. Finally, we identified (and cite in Sections III-VII) over 250 relevant papers in total. Additionally, we reference over 20 survey papers in Section II. 1 We could not include SpringerLink at this stage as it does not allow to search within the abstracts of published papers. Papers from this database were added manually.

C. Survey Scope and Contributions
The structure of the survey is depicted in Figure 2 together with an indication of the ML methods reported in the state-ofthe-art papers, for each of the surveyed areas. After a short summary of related surveys (Section II), we first investigate core Wi-Fi features in Section III. This section explores aspects such as the use of ML for selecting PHY features, optimizing channel access, configuring frame aggregation and link parameter settings, data rate selection, as well as quality of service (QoS), admission control, and traffic classification. In Section IV, we study the benefits of using ML to support more recent Wi-Fi features, such as channel bonding, multi-user MIMO (MU-MIMO), spatial reuse, and multi-band operation. Wi-Fi connectivity management is discussed in Section V. Here, we explore the applicability of ML to access point (AP) selection and association, channel and band selection, management architectures, and determining the health of Wi-Fi connections. In Section VI, we investigate ML-optimized coexistence of Wi-Fi with other technologies: channel sharing, network monitoring, and cross-technology signal classification. Next, in Section VII, we study ML algorithms for multi-hop Wi-Fi deployments: ad hoc networks, mesh networks, sensor networks, vehicular networks, and relay networks. Finally, we elaborate on future research directions in Section IX and conclude the paper with Section X. Appendix A contains the list of acronyms used.
To summarize, our contributions are the following: • A structured approach to describing the various areas of Wi-Fi performance where ML is applied. We elaborate on core Wi-Fi features, through recently added features to management issues as well as Wi-Fi operating in shared bands with other technologies and in multi-hop topologies. • A review of over 250 papers in the field. We provide readers with an overview of what has been done and what are the main trends of applying ML to particular Wi-Fi performance problems. • The identification of open challenges in every area of Wi-Fi performance, given at the end of each Section III-VII. Additionally, we provide an overview of the general future research directions in applying ML for improving Wi-Fi performance, to provide readers with an analysis of what remains to be done in the field. We hope that the survey will be beneficial both for beginners 2 and experts in the field, looking for a comprehensive summary of the latest research in the area of improving Wi-Fi performance with ML. We also believe that this survey will guide readers towards proposing new ideas in this developing research area.

II. RELATED SURVEYS
Many surveys address the development of ML models to support wireless networks. Reported contributions consider the application ML to both Wi-Fi and application-specific networks, such as wireless sensor networks (WSNs), cognitive • analyze and manage mobile networks in several directions, e.g., network state prediction, network traffic classification, call details mining, and radio-signal analysis [41]; • improve the performance of mobile systems [14] and IoT [13]; • identify wireless modulations/technologies [42]; • provide fair and efficient spectrum sharing in 5G [43] and in future 6G [44] networks; • maximize the potential of unlicensed bands for Industry 4.0 applications [45].

D. Summary
State-of-the-art surveys report the wide applicability of ML models for wireless networks. Table I summarizes the presented surveys per addressed technology, scope, and remarking their corresponding Wi-Fi-related topics.
Specifically, in the Wi-Fi area, the reported surveys are application-oriented, focusing on human activity detection algorithms, indoor localization mechanisms, and network security issues. In the wireless networks area, the surveys, in general, provide few details concerning the use of ML models to improve the performance of the 802.11 protocol family. The most often surveyed topics are the coexistence of Wi-Fi networks with other technologies, its performance evaluation, channel selection mechanism, and signal identification in the context of cognitive radio technologies. Finally, concerning the 5G and Wi-Fi area, surveys mostly cover the concept of spectrum sharing mechanisms for coexistence between the two networks. Therefore, the lack of a dedicated Wi-Fi performance survey coupled with the variety of research papers addressing the specifics of using Wi-Fi with ML ( Figure 1) have motivated our work, which we hope will be valuable to the research community.
Note that there are three non-performance related areas involving both Wi-Fi and ML which are out of the scope of this survey: Wi-Fi [19] Large-scale network monitoring Wi-Fi analytics 2020 [20] Quality indicators accounting for user satisfaction Wi-Fi quality indicators 2020 [21] Indoor localization Application-oriented 2020 [22] 2019 [23] 2019 [24] 2018 [25] Human activity detection 2017 [26] Intrusion detection Wi-Fi security 2021 [27] 2016 Wireless networks (IoT, CRN, M2M, MANET) [37] Detection and identification of IoT devices Identification of devices and security protection 2021 [40] Federated learning Privacy protection 2021 [39] Applications of transfer learning in wireless networks Insufficient details concerning Wi-Fi functionalities 2021 [9] Performance improvement in a variety of wireless networks like HetNets, CRNs, IoTs, and M2M 2020 [28] Performance improvement in the PHY/MAC/Network layers as well as novel networking concepts (MEC, SDN, NFV) 2020 [38] Optimization of communication and computing technologies of IoT systems 2020 [11] ML models to support resource management, networking and localization in wireless networks Power saving mechanisms for Wi-Fi infrastructure, indoor localization mechanisms 2019 [33] Decision making and feature classification in CRNs Collaborative coexistence of Wi-Fi networks with other technologies, performance evaluation, dynamic channel selection 2013 [34] ML models to support cognitive radio capabilities Collaborative coexistence of Wi-Fi networks with other technologies 2013 [35] ML models to support cognitive radio capabilities Wi-Fi signal identification 2010 Wi-Fi and 5G/6G [12] Broad survey covering data science fundamentals, 5G, Wi-Fi, CRN General networking concepts like interference recognition, network traffic predictions, and MAC identification Insufficient details concerning Wi-Fi functionalities 2020 [43] Coexistence mechanisms in 5G networks Coexistence of 5G and Wi-Fi 2020 [46] 2018 [14] Mobile and wireless networking research based on deep learning Indoor localization applications and signal processing in Wi-Fi networks 2019 • dedicated applications of Wi-Fi (unrelated to network access), e.g., device positioning, human activity detection, • energy efficiency (e.g., power-saving protocols), and • network security (e.g., detecting selfishly configured devices [47]).
There has been broad adoption of ML in these areas and they deserve literature reviews of their own, such as [21], [26]. Furthermore, our survey does not describe how various ML methods operate. There are numerous books and research papers on this topic; we refer the reader to papers such as [9], [12] for a detailed (although still wireless networking-related) discussion of these methods. 7 .11 n/ac/ax amendments increase the data rate up to 9 Gbit/s leveraging the increasing number of spatial streams (SSs), and techniques like channel bonding, multi-user transmissions, short guard interval (SGI), and high modulations (up to 1024-QAM for 802.11ax) [65], [91], [92]. The impact of such a variety of parameters on the network performance is highly difficult to characterize considering the variability of Wi-Fi environments and the users' dynamics. However, the availability of performance metrics, both at the user and the AP level, along with historical data provides a favorable environment for ML methods to model the impact of such parameters on the network performance and optimize it. The capability of ML-based methods to gain knowledge, generalize, and learn from experience allows conceiving smart systems using the augmented functionalities of the IEEE 802.11 standard.
In the literature, there are many ML solutions for 802.11's PHY and MAC layers to adaptively optimize the internal parameters of Wi-Fi's core features in dynamic scenarios. As summarized in Table II, contributions are reported in four areas to • reduce collisions when accessing the channel, • maximize rate with the proper link configuration, • find the optimum balance for the frame length with frame aggregation techniques, • address interference and signal denoising at the PHY layer. These solutions mostly use RL methods to adjust access parameters and SL methods to estimate the channel condition for improved performance.
Although there is a variety of reported solutions, still more work taking into account the overall network performance is needed. Besides, the application of ML models is usually verified with simulations, with just a few research papers dealing with real scenarios that include, for instance, user mobility. In this section, we cover the core Wi-Fi features mentioned above and summarize the open challenges when conceiving ML models in Wi-Fi environments.

A. Channel Access
Channel access mechanisms are perhaps the most often addressed topic concerning the improvement of Wi-Fi performance with ML. Proposed optimizations refer mostly to the basic 802.11 MAC protocol, i.e., the DCF, which is the baseline mechanism to avoid collisions among devices when accessing a common radio channel [93]. The main parameter responsible for the performance of DCF is the contention window (CW), which defines the range from which stations randomly select their waiting periods (i.e., the backoff counter) to avoid collisions when accessing the channel. Larger CW values reduce collisions but increase idle times, which in turn reduces throughput. Smaller CW values increase the chance for a station to transmit, but also increase the collision probability, thereby reducing throughput.  Multiple studies consider the selection of CW values to maximize throughput by reducing both collisions and idle periods. SL and RL models are typically applied. Loss functions and rewards are addressed in the form of reduced collisions [51], [58], increased difference between successful and collided frames [48], improved channel utilization [50], increased successful channel access attempts [52], [60], throughput [57], network utility [94], and a combination of improved throughput, reduced energy, and decreased number of collisions [49]. As summarized in Figure 3, SL [50], [52], RL [48], [49], [58], deep reinforcement learning (DRL) [51], [57], [60], and federated learning (FL) [59], [60] models are applied to the IEEE 802.11 standard [49], [60] and its amendments, most importantly 802.11ac [50], 802.11e [48], [54], 802.11n [52], and 802.11ax [51], [57]. We provide a summary of the major findings next, while an illustrative example of using RL to optimize Wi-Fi channel access parameters is given in Figure 4.
1) Collision Reduction: In high-density 802.11ax WLANs, RL with the intelligent Q-learning based resource allocation (iQRA) is considered by Ali et al. [51]. Instead of resetting the CW value whenever the channel is idle (as in DCF), it is calculated by considering the channel collision probabilities according to the channel observation-based scaled backoff (COSB) protocol [95]. In this direction, the cumulative reward (accounting for the probability of collisions) is minimized by optimally adjusting a policy to update the CW size. The iQRA mechanism increments or decrements CW (according to COSB), finding a balance between optimal actions (concerning the best policy to reduce the collision probability) and exploring new actions to account for the dynamicity of Wi-Fi environments. Results from the ns-3 network simulator, for both small (with 15 stations) and dense (with 50 stations) networks, confirm that the solution outperforms the baseline 802.11ax protocol  . Configuration of Wi-Fi channel access parameters with RL: example of CW optimization with CCOD [57]. As the number of contending stations steadily increases over time (a), the AP monitors the collision probability and uses RL to select the CW value for all associated stations to maintain throughput higher than under legacy operation (b). The example shows two RL algorithms that use different types of output (DQN -discrete, DDPG -continuous) and which explore the available parameter space in search of better CW values (a).
in terms of throughput, while the delay remains similar.
Zhu et al. [48] implement a programming paradigm called adaptation-based programming (ABP), where the reward is the difference between successful transmissions and collisions. ABP optimizes the specifics of RL for two possible actions: halve the CW size or leave CW unchanged after a successful transmission. Simulations performed in ns-2 with 20 stations show a reduction of the total number of dropped packets by four.
The random forest (RF) algorithm is applied in a supervised manner to balance the minimum CW size among users and account for fair channel access [50]. The algorithm departs from monitoring channel variables (i.e., busy time, channel occupancy by the user, the number of sent frames) to build a decision tree regarding the variety of settings. The algorithm is implemented in indoor 802.11ac scenarios with up to 8 stations. Throughput, latency, and fairness are improved by 153.9 %, 64 %, and 19.34 %, respectively, when compared to the 802.11ac standard.
The size of the CW can also be adjusted by directly increasing the access to the channel through the fixed-share algorithm [52]. CW is derived by weighing a set of possibilities on the CW range predefined in advance, where the larger the weight, the larger the influence of the particular CW value. Whenever a successful transmission occurs, the weight of users with the largest CW is reduced to increase the chances of transmissions and the weight of users with a lower CW is increased. In the case of collisions, the performance is the opposite. With this mechanism, a balance is achieved between aggressive (small CW) and non-aggressive (large CW) users. Simulations in ns-3 of randomly deployed senders show that in a heavily loaded scenario (with 100 users), throughput is improved by 200 % and the end-to-end delay is reduced by 33 % when compared to DCF.
2) Scalability: To address the scalability of 802.11ax networks, a DRL model provides stable throughput under an increasing number of stations [57]. A centralized solution is applied for two trainable control algorithms: DQN and DDPG. A three-phase algorithm is designed to (1) evaluate the history of collision probabilities, (2) the training of both DRL models by maximizing the reward (throughput), and (3) their deployment in the network. The algorithm is implemented in ns3-gym [96] with a single AP and up to 50 stations. Compared to the 802.11ax standard, which leads to a decreased network throughput of up to 28 %, the two algorithms exhibit a stable throughput value for an increasing number of stations.
A post-decision state-based (PDS) learning algorithm is applied by Amuru et al. [49] to take advantage of previous knowledge of the system components such as the CW and the transmission buffer occupancy. In contrast to Q-learning (QL), PDS achieves faster convergence to optimally compute the CW when asserting its value in specific states. For instance, when the channel is free and the station is waiting to transmit, the CW will certainly be reduced by one. In such a case, the corresponding transition probabilities do not have to be learned, thereby increasing the convergence speed by eliminating exploration actions. The solution exhibits enhanced throughput, especially with moderate network load, in comparison to Qlearning, the 802.11 standard, and alternative deterministic mechanisms like exponential-increase exponential-decrease (EIED).
3) User Fairness: The CW can also be adjusted considering user fairness metrics [60]. To that end, FL and Q neural network (QNN) models are implemented in APs and stations, respectively, as a distributed method. When each station randomly initializes its QNN parameters, some stations will use a more aggressive strategy to access the channel (by choosing small CWs). Such behavior, however, will block the transmissions of stations initialized with a less aggressive strategy (with large CWs). To ensure fairness, the AP obtains a global model of the QNNs through FL and later broadcasts updated CW values to stations. Simulation results for a single AP and a total number of stations up to 50 show that throughput is improved by 20 % when compared to DCF.
An improved DQN is trained for minimum CW selection and deployed at stations to achieve per-user fairness [58]. The extension of DQN is achieved through rainbow agents [97], which incorporate six improvements: double DQN, prioritized reply, dueling networks, multi-step learning, distributional RL, and noisy nets. The ns-3 simulation results, for 32 stations transmitting at a constant rate of 1 Mbit/s, show that the solution achieves results close to optimum and it is superior to an RFbased method. 4) QoS: Driven by the need to distinguish between traffic priorities, DCF was extended to enhanced distributed channel access (EDCA) in the 802.11e amendment [98], [99]. To that end, new MAC parameters were introduced per traffic class: CW, arbitration inter-frame space (AIFS), and transmission opportunity (TXOP) limit [100] 3 . AIFS together with CW are directly responsible for the trade-off between delay and throughput. In this direction, a three-phase scheme is implemented by Coronado et al. [54] to select the best combination of CW and AIFS supported by ML. In the first two phases, a range of AIFS and CW values are selected relying on decision tree algorithms, e.g., J48 for classification and M5 for prediction. Then, in the third phase, the best combination for AIFS and CW is derived. Simulation results exhibit high accuracy on the throughput prediction when varying the CW range, AIFS, and the total number of stations.
To ensure priority-based channel access, within the EDCA distributed scheme, a QL model is implemented to infer network density and adjust the CW value [55]. In EDCA, the CW is set to be smaller for high-priority traffic like voice and video. The optimal CW value is derived for the four different traffic priorities defined by EDCA. Simulation results are derived in the ns-3 simulator, where the throughput per traffic type is improved in comparison with the standard EDCA mechanism.
5) Time-slotted Access: Additionally, collisions are avoided in channel access mechanisms where users are scheduled per time slots [53]. Each station stores a table consisting of the available time slots in which a given frame is to be transmitted. The available time slots are selected by an RL method to find appropriate actions when occupying the channel.
Finally, Kihira et al. [56] consider a channel access problem between two APs: the protagonist, which is equipped with an agent, and a second AP called the 'outsider'. Time is divided into slots, where both APs can decide to transmit independently and the goal of the agent in the protagonist AP is to find, based on learning the behavior of the outsider AP, the transmission probability that maximizes its throughput. A robust adversarial RL framework that uses game theory models the interactions between the two APs. The framework can learn the best transmission policies through Q-learning.

B. Link Configuration
In response to growing user demands, the IEEE 802.11n/ac/ax amendments implement high-throughput wireless links through dedicated features at the PHY and MAC layers [69]. High data rates are achieved through a variety of functionalities at the PHY layer including channel bonding, multi-SS transmissions, the use of SGI, and high modulations (1024-QAM for 802.11ax) [65], [91], [92]. At the MAC layer, frame aggregation and block acknowledgment are the two main features for improving the maximum link throughput. 3 To support fine-grained traffic prioritization, IEEE 802.11e is extended by IEEE 802.11aa [101], however, we did not find any papers devoted to ML-based optimization of 802.11aa. Link configuration, in the form of selecting appropriate PHY and MAC parameters, is required to achieve the optimum throughput for given network and channel conditions. Rate adaptation plays an important role in link configuration, which is responsible for the selection of MCS values for each transmission. In dynamic Wi-Fi scenarios (e.g., due to user mobility or interference), rate adaptation deals with the following counteracting mechanisms: • high data rates may lead to high error rates when decoding the transmitted bits, thereby reducing throughput; • reducing the data rate may incur poor channel utilization and thus also reduce throughput. The trade-off between transmission errors and channel utilization can be evaluated by applying ML models, particularly to deal with varying channel conditions. Figure 5 depicts how ML models are used for rate selection. In the following, we summarize the contributions in the selection of optimal MCS and SGI values, and a variety of trade-offs at the PHY layer.
1) Rate Adaptation: Rate adaptation solutions predict the probability of successful transmissions for each MCS candidate. Then, the data rate is selected corresponding to the MCS with the best result. Predictions are made based on signal to noise ratio (SNR) [63], [67] or follow a cross-layer approach based on acknowledgment (ACK) or negative acknowledgment (NACK) feedback [62], [64], [72]. SNR is preferred to timely update the channel status when dealing with station mobility, e.g., in the case of vehicular ad doc networks (VANETs) [63]. However, more accurate solutions are obtained when updating the channel status based on the ACK and NACK feedback [63].
For SNR-based predictions, throughput is improved through a two-level data rate search algorithm based on an artificial neural network (ANN) model [67] or using an RF algorithm [63]. In the former, the ANN is implemented as a coarse estimator to find a possible set of best data rate candidates. In the second stage, a fine-grained solution is devised to identify the best candidate from this set. With this solution, at least a 25 % improvement is reported in mobile scenarios when compared to baseline rate adaptation algorithms like Minstrel [102]. Puñal et al. [63] implement the RF algorithm for uplink data rate adaptation in VANETs. The algorithm uses the position and velocity of cars to estimate the SNR in the link between the APs and the vehicle. The algorithm predicts the probability of successful transmission for each possible data rate candidate and then selects the best candidate. With this approach, the goodput is improved at least by 27 % in comparison to reported solutions like collision-aware rate adaptation (CARA).
The unpredictable impact of fast fading de-correlates, however, the correspondence between SNR and packet loss due to their large fluctuation in short periods. To deal with this problem, Joshi et al. [62] implement a method inspired by stochastic learning automata (SLA) which does not assume any predefined relation between SNR and packet loss. The algorithm updates a selection probability vector in a oneto-one mapping to the available data rates. The learning procedure is implemented to adjust this vector, with throughput being the reward function, while the ACK frames are used as feedback to account for the channel condition. Thus, the probability corresponding to the rate that produces the best reward is updated, leading to a 15 % throughput improvement in comparison to other reported solutions.
Thresholds to detect successfully and non-successfully received packets are derived through ML models to improve aggregate throughput by counting received ACKs [64]. Based on the legacy auto rate fallback (ARF) algorithm, the data rate is increased or decreased when the total number of ACK is higher than a given threshold. Thresholds are adjusted by an ANN when estimating their correlation with the achievable throughput considering the total number of stations, channel conditions, and traffic intensity. Results show that the aggregated output is increased by 10 % in a network of 10 stations.
Rate selection can also be performed by first identifying the channel condition, e.g., using supervised learning [66] or Q-learning [72]. In the former, the channel condition is classified as residential or office environments, then the proper MCS level is selected. The model is trained based on selected characteristics of an 802.11 frame's preamble. In the Q-learning model, the MCS level is adjusted based on the total number of received ACKs. Observation of the network state is conceived through timeout events, which are referred to as the total number of missing ACKs. Simulations in ns3-gym [96] consider a dynamic scenario, where the receiver station moves away from the sender at a speed of 80 m/s with throughput comparable to Minstrel.
Alternatively, MCS is selected considering also the available bandwidth and selected spatial streams. Chen et al. [73] apply the double deep Q-network (DDQN) model using goodput as a reward and include further learning techniques like prioritized training, history-based initialization, and adaptive training interval. Results show that this method, implemented in hardware, significantly outperforms default mechanisms.
2) SGI Adaptation: The selection of the SGI values is another link configuration mechanism that is supported by ML models. The SGI assumes two (802.11ac) or three (802.11ax) different values. The selection between them is implemented through Thompson sampling (TS) by Karmakar et al. [69].
Such an online learning mechanism deals with the fluctuation of channel quality (signal interference, signal fading, and attenuation). The TS model is evaluated with simulations in ns-3 for an 802.11ac network with up to 40 stations. For SNR varying randomly in the range of 20-60 dB, the results show a slight throughput improvement compared to the static SGI settings.
3) PHY Layer Trade-offs: There are a variety of tradeoffs inherent to the PHY layer: wider channels versus more interference, MCS versus required SNR, frame aggregation versus packet loss, etc. These trade-offs may be jointly addressed to optimize the overall performance using ML methods such as multi-armed bandit (MAB) [65], [103]- [105] and deep learning (DL) [68]. Karmakar et al. design an online learning-based mechanism based on the MAB framework for link configuration in 802.11ac networks [65], [103]- [105]. This solution considers both network load and channel conditions and uses a MABbased adaptive learning (AL) (i.e., the ε-greedy algorithm) along with fuzzy logic. Through this approach, the network performance is improved thanks to the ability to explore multiple configurations. The resulting implementation exhibits increased throughput (up to 358 %) when compared to existing solutions.
Karmakar et al. [68] improve throughput with a two-step algorithm that considers several parameters from the PHY and MAC layers simultaneously (channel bonding, MCS, and frame aggregation settings). First, a deep neural network (DNN) estimates throughput assuming different link parameter settings. Then, a predictive control-based search algorithm finds the optimal parameter values which maximize throughput. Experimental results are obtained through IEEE 802.11ac client boards installed on laptops. Results exhibit superior performance concerning delay and throughput in comparison to three baseline algorithms.
Rate adaptation algorithms are also designed for specific applications in industrial networks [70]. An RL-based mechanism solves the trade-off between reduced packet loss and increased transmission rate. The learning procedure is implemented through the state action reward state action (SARSA) algorithm. The balance between exploration and exploitation is conceived through the ε-greedy algorithm. With this approach, packet losses are reduced by 6 % when compared to non-RL-based algorithms.

C. Frame Aggregation
Frame aggregation directly impacts the communication efficiency in terms of useful transmitted data and overhead [106]. Efficiency is analyzed in terms of errors produced during packet decoding: larger frames can lower the impact of overhead, but they are also more susceptible to transmission errors. This trade-off is addressed by frame aggregation techniques to derive the optimum frame size to maximize efficiency. The 802.11 standard introduces two basic aggregation methods: the aggregated MAC service data unit (A-MSDU) and the aggregated MAC protocol data unit (A-MPDU) [2]. These aggregations can also be used together [100]. The A-MSDU method is more efficient but more prone to errors than A-MPDU since it contains only one frame check sequence (FCS) accounting for all aggregated frames. The A-MPDU method is more robust but introduces more overhead as it generates several FCSs, one per each subframe. However, their dynamic adjustment in the 802.11 standard is not designed to deal with the varying channel state information (CSI) in wireless links.
To optimally select the frame size under dynamic conditions, ML techniques are used ( Figure 6), including SL [74], [76]- [79] and RL [75]. Their use is reported for generic 802.11 networks to maximize throughput [74], for 802.11n to maximize goodput [76], [77] and for 802.11ac to address the energy-throughput trade-off [75] as well as to estimate the aggregation levels in 802.11ac [79].
Coronado et al. [76] implement a low computational complexity technique for the downlink direction. A random forest regressor (RFR) model configures both the aggregation and MCS settings. Results are obtained for small and medium-sized networks (up to 20 stations). This solution lowers the rate of retransmission resulting in goodput improved by 18.36 % when compared to legacy 802.11 aggregation mechanisms.
Aggregation methods supported by ML are also designed for software-defined WLANs (SD-WLANs) as an artificial intelligence (AI)-based operating system [77]. The M5P and the RFR models are implemented due to their low computational complexity. Intended to provide a frame length that maximizes goodput for each user, their training is performed with real measurements in a Wi-Fi scenario with up to 10 stations. Here, the RFR model presents the highest goodput improvement (55 %) when compared to the A-MSDU mechanism.
The MCS level can also be predicted through an ANN [78]. The model is trained in a client device by receiving packets from an AP using all available rates within a 1 s time window. Estimated rates are then used to compute the best aggregation level using a previously designed (non-ML) method [107]. The implemented solution outperforms baseline algorithms by at least 13 % in terms of throughput.
Aggregation level estimators can help in queue backlog-ging. Hassani et al. [79] use ML techniques on obtained hardware-level timestamps to determine the aggregation level implemented at a given AP. A logistic regression estimator model provides an accurate aggregation level estimator with low computational complexity. This solution is implemented in non-rooted hardware as client devices, where the achieved accuracy to determine the proper aggregation level is close to 100 %. Frame aggregation settings can also consider the associated energy costs [75]. Based on the channel condition (given by the SNR value), the aggregation level is selected as the one with the smallest frame error rate (FER) to reduce the energy costs caused by retransmissions. The solution combines an online learning algorithm to define a set of suitable aggregation levels and fuzzy logic to select the most suitable level from that set, by estimating which frame size would have the lowest FER. With this approach, the resulting energy efficiency with 10 stations is 14 % better when compared to the standard use of A-MSDU and A-MPDU mechanisms.
Finally, channel condition and impact of collisions are jointly addressed by Lin and Lin [74] to adjust both the frame size and CW. An ANN model is trained with frame size-throughput patterns to provide a gradient indicating the direction of the optimal frame and the CW sizes. Simulation results, provided for 10 mobile users, show that throughput is improved when compared to the case when only the frame size is optimized (i.e., without additionally considering the optimal CW).

D. PHY Features
At the PHY layer, a variety of actions are supported by ML techniques to improve the performance of Wi-Fi networks. Issues that are addressed include: • collision detection characterization [80] and its mitigation [81], [82], • interference power-level characterization [87] and its mitigation [108], • signal de-noising [85], source detection to improve spectral efficiency [84], • prediction of signal strength variability [88], • the enhanced modeling of the PHY and MAC layer interactions to improve throughput [83].
As depicted in Figure 7, a variety of ML models are available to deal with these problems, which we describe next.

1) Collision Reduction:
To estimate the number of collisions in the channel, the activity of stations in the network is modeled as a hidden Markov model (HMM) [80]. The approach is to use RL techniques to learn the parameters of such models, then to mathematically evaluate the probability of collisions. The transition probabilities are assessed through the expectation modification algorithm (EMA). Based on the derived transition probabilities, the probability of collisions is directly computed based on the estimated total number of senders that simultaneously transmit. Results are provided for seven APs deployed with an equal number of clients over two floors of a building. The estimated deferring probabilities exhibit a good correspondence with the real condition scenario. To improve the decoding of request to send (RTS) frames during collisions, Lee et al. [81], [82] implement an ML model. A Bloom filter decodes the RTS frames, and a supervised ML technique solves the inherent ambiguity with an accuracy larger than 99 %. The ML is implemented through a variety of algorithms such as naive Bayes, naive Bayesian tree, J48 decision tree, and support vector machine (SVM). Additionally, this solution is connected to a second, kε-greedy algorithm for channel allocation. The integration of both algorithms improves the performance 3.3 times over legacy 802.11 operation.
2) Interference Estimation: The interference level is estimated by modeling the network through a determinantal point process (DPP) [87]. An SL-based process is implemented to evaluate the total number of active transmitters that may interfere with each other and learn their locations. Interference is evaluated by providing the cumulative density function (CDF) for the total number of active users. This evaluation is then used when modeling the power of the interference signals through a path-loss model for each link. Results illustrate a good match with a theoretical model regarding the CDF of interference levels.
3) Signal Quality Estimation and Management: The received signal strength is predicted through deep learning techniques by Herath et al. [88]. In a recurrent neural network (RNN) model, encoder and decoder components are implemented to capture the CSI and predict its variability, respectively. The model is trained according to three different schemes to balance the trade-off between convergence speed and performance: • guided training which uses current measured signal strength (resulting in faster convergence), • unguided training which uses predicted signal strength (resulting in better prediction performance), • curriculum training which combines both previous methods to balance the speed and prediction performance.
With the curriculum training scheme, the resulting prediction accuracy of the signal strength is improved when compared to linear regression and auto-regression methods.
The quality of the received signal can also be improved at the PHY layer using DL techniques [85]. With an ANN, the preamble of the 802.11 family protocols is de-noised by unfolding the useful signal from noise in the spectrogram domain (i.e., time-frequency domain). The spectrogram is processed as an image, where the ANN, used as a convolutional de-noising auto-encoder, estimates the originally emitted patterns. With this approach, the derived reconstruction accuracy is around 85 %.
The spectral efficiency of Wi-Fi transmissions can also be improved when avoiding the exposed terminal problem. To that end, senders are identified according to their CSI to later predict whether they will interfere with each other [84]. To implement such an identification mechanism, a model is trained through k-nearest neighbor (kNN) and ANN with 20 wireless stations in indoor scenarios, where an accuracy of 90 % is achieved with at least 30 samples per station. In the case of reduced total samples, better performance is obtained with the kNN model.

4) Interaction with the MAC Layer:
The PHY layer can also be modeled in unison with the MAC layer to characterize the impact of different features on observed throughput [83]. The selected input features are received power, channel width, spectral separation between users, traffic load, and physical rates. The idea is to find a mathematical function that maps input features to throughput values supported by supervised ML. This mathematical function then becomes a black box representation of a given link to later optimize throughput. The learning phase, which is used to obtain this function, is derived through regression techniques: regression tree, gradient boosted regression tree (GBRT), and support vector regressor (SVR). In particular, simulation results show that GBRT and SVR provide the most accurate results in comparison to a benchmark.

E. Open Challenges
From the multitude of papers addressing core Wi-Fi PHY and MAC features, we identify several open challenges related to: • studying more realistic settings (including user mobility), • removing common simplifying assumptions, • improving ML-based solutions. We describe these challenges below.
First, there is a need for more realistic simulations. Several reports address the intention to provide simulation testbeds with less simplifying assumptions. For instance, the inclusion of more realistic channel and traffic models, variable channel conditions per user, dense networks, or the addressing of multihop networks are some remarked requirements to conduct further research as remarked in [48], [51], [52], [57], [62], [70], [98], [100].
Second, studies are needed to consider overall network performance. Currently, papers address specific optimization parameters under specific conditions. Although some work is reported to simultaneously address a variety of parameters of Wi-Fi networks (cf. Section III-B3), an overall perspective of network functioning, which would account for optimization criteria in several layers simultaneously, has been not conducted yet. While improved performance is achieved when addressing cross-layer designs [109], solutions to posed problems in this direction are rather difficult to solve by analytical means due to the variety of related parameters. As yet unexplored, this constitutes a promising research direction to address by ML models.
Third, only a few papers provide details on the impact of user mobility on communication performance [72], [74]. However, considering the growing number of mobile Wi-Fi devices (e.g., phones, tablets, even vehicles), further insights can be provided to better characterize the influence of their movement on the network performance.
Finally, many reported works remark future directions concerning the improvement of ML-based solutions to: • provide accurate ML models (additional loss functions) [52], • reduce the coordination overhead between agents in decentralized solutions [49], • further study the impact of network status parameters on traffic prediction [65], [104], and • increase the complexity of ML models to better characterize network functioning [72], [76], [80], [87].

IV. RECENT WI-FI FEATURES
In a push for higher and more efficient performance levels, recent Wi-Fi amendments such as 802.11ac [110], 802.11ax [3], and 802.11be [111] introduce new advanced and complex techniques such as multi-user communications (OFDMA, MU-MIMO) [112], spectrum aggregation and opportunistic spectrum access (channel bonding [113], multi-link operation [114]- [116]), spatial reuse [8], and multi-AP coordination [5], [117]. All these techniques promise high-performance gains in both throughput and latency but also open new challenges. These challenges are solved, or at least alleviated, using ML methods (see Table III) as we show in this section.

A. Beamforming
Transmissions in the millimeter wave (mmWave) 60 GHz shared band are a specific Wi-Fi use case aimed at greatly increasing the transmission rate in line of sight (LoS) communication scenarios, both short-range (indoor) and longrange (outdoor), the latter known at fixed wireless access (FWA) [118]. To cope with the increased attenuation in this band, beamforming of transmissions is required. This functionality was first introduced to Wi-Fi in 802.11ad and later extended in 802. 11ay.
A key problem of 802.11ad/ay networks, which are solved using ML, is finding the optimum beam sector pairs (i.e., beam alignment) between transmitter and receiver ( Figure 8). Alignment is derived from a beam sweeping procedure, which can take up to tens of milliseconds and needs to be periodically repeated. To facilitate the beam sector pair selection, Chang et al. [119] replace the standard method of an exhaustive beam search with one of three neural network (NN)-based algorithms to predict the optimal beam sector, including with historical data. This work is extended by Shen et al. [120] with the training duration reduced through a combination of SL-based feature extraction and RL-based training beam selection. Meanwhile, Polese et al. [121] develop DeepBeam, a framework for beam selection that replaces the time-consuming beam sweeping procedure with inferring the beam sector to use through deep learning based on passive listening to other transmissions.
Alternatively, improved ML-based beam alignment predictions are performed with camera images. Salehi et al. [122] show that visual information can significantly reduce the time required to establish the best beam pairs. Nishio et al. [123] also apply ML camera images to accurately and rapidly predict received power, which is the necessary information needed to find beam sectors. Additionally, camera-based predictions of link outage with DRL lead to improved handovers in mmWave networks [124].
Since the range of mmWave bands is short, 802.11ad/ay APs may need to be densely deployed for certain use cases. Under such network densification, beam coordination, and interference management become necessary. Mohamed et al. [125] reduce cross-beam interference by applying statistical learning to construct a radio map of the network environment, which serves as input for beam selection. In this scenario, signaling is carried over the Wi-Fi network in the 5 GHz band through a centralized AP controller. Zhou et al. [126] optimize the beams in a centrally-managed deployment with a DNN-based solution. Their solution achieves nearly the same performance as an optimization algorithm at a fraction of the computational time.
A related problem in dense deployment scenarios is the association between user stations and APs, especially since next-generation stations will have multi-homing capabilities (i.e., methods allowing sustained connectivity to multiple APs). This leads to an interesting user-to-multiple APs association problem, which is solved using ML methods. Ly Dinh et al. [127] consider a generic WLAN where users can autonomously learn, using their own DQN, which APs to connect to and using which band (sub-6 GHz or mmWave). Once appropriate 802.11ad/ay beam sectors have been found, rate adaptation is required. MCS selection for mmWave transmissions relies on appropriate channel classification, i.e., determining whether a channel is LoS or non-line of sight (NLoS). This classification is augmented with ML, as shown by Kurniawan et al. [128], where classification is done with the RF technique. The prediction of statistical characteristics of a channel can also be useful and many papers focus on the PHY layer (regardless of the wireless technology). For example, Bai et al. [129] use a trained convolutional neural network (CNN) to predict the statistical characteristics of a channel for any given (indoor) location for technologies using massive multiple-input multiple-output (MIMO).
Alternatively, rate adaptation can be based on typical metrics available in commercial off-the-shelf (COTS) devices. Aggarwal et al. [132] predict optimal MCS settings using three ML models: decision tree (DT), RF, and SVM. They conclude that RF provides the best results and outperformes SNR-based rate selection strategies. This approach is extended in the learningbased beam and rate adaptation (LiBRA) framework [133], where the same ML-based classification methods determine which of the two adaptation methods (rate selection or beam selection) gives better performance for a given link.
The data rate of mmWave links can be improved by better channel estimation techniques. Lin et al. [131] combine transceiver location information with a DNN to evaluate the channel frequency response. This approach decreases the number of transmitted pilot signals, leaving more room for user data.
Finally, in terms of channel access, 802.11ad introduces a new hybrid MAC with contention-free and contention-based periods. The specifics of the resource scheduler are out of the standard scope and remains an open research challenge [171]. Azzino et al. [130] find the optimal duration of the contention-free period by observing the time-varying network load and using an RL-based approach. Their scheme preserves throughput for allocated streams while leaving more resources for contention-based traffic.

B. Multi-user Communication
With the IEEE 802.11ac amendment, and its support for downlink MU-MIMO transmissions, Wi-Fi opened the door to support multi-user transmissions, i.e., simultaneously transmitting to different stations in the same TXOP using spatial multiplexing. IEEE 802.11ax extends IEEE 802.11ac MU-MIMO features and provides support to both downlink and uplink, as well as orthogonal frequency-division multiple access (OFDMA). OFDMA, which could be considered as the most disruptive novelty introduced in IEEE 802.11ax, divides the available bandwidth into different sub-channels, called RUs, which is then allocated to different users. Both MU-MIMO and OFDMA will also play an important role in future IEEE 802.11be networks. In 802.11be, beyond extending Wi-Fi capabilities by using 320 MHz channels and up to 16 spatial streams, some improvements such as the allocation of multiple RUs to the same user and support for implicit channel sounding will be introduced [111]. The most significant challenge in multi-user communications is identifying and creating groups of compatible stations that, when simultaneously scheduled, result in higher network performance. The problem is both complex and non-linear so suitable to be tackled using ML techniques due to the need to choose a particular group of stations and configure their link parameters with only partial information in a rapidly changing environment. Figure 9 shows the case where an AP empowered with an ML agent is in charge of taking these scheduling decisions. First, it must learn that station (STA) 1 and STA 3 can belong to the same MU-MIMO group. Then, given the AP has data to transmit to all three stations, the ML agent has to decide how to allocate the different available RUs to them. In this example, it has agreed to allocate a larger RU to STA 1 and STA 3 for a MU-MIMO transmission, and a smaller one to STA 2.
Several papers address the problems of user selection, link adaptation, and channel sounding overhead reduction in MU-MIMO-enabled WLANs using a variety of ML strategies. Karmakar et al. [136] implement an ε-greedy strategy to find the best configuration (group and link parameters) using experience. Rico-Alvarino and Heath [135] use an SVM classifier to develop a robust MCS selection procedure. Reducing the channel sounding overheads using DNNs to compress CSI at each station and decompress CSI at the AP is presented by Sangdeh et al. [139]. Finally, a different approach is considered by Su at al. [137], [143], where a policy gradient technique determines if a certain client will benefit from participating in MU-MIMO transmissions. The policy function is represented by a neural network consisting of two convolutional layers. In all analyzed cases, results show significant network throughput improvements.
For OFDMA, in the case of AP-initiated transmissions, either in downlink or uplink, the AP must determine the group of stations scheduled at each TXOP, and which is the best RU allocation to them. Alternatively, in the case of uplink transmissions, stations may be allowed to select the RU that they will use for transmission [172]. These problems are considered using DRL techniques [138], [140], [141]. Kotagiri et al. [140], [141] focus on the uplink case and use a decentralized RU selection method with DRL (i.e., a CNN-based DQN) that provides higher gains when compared against the case when RUs are selected randomly. Balakrishnan et al. [138] consider the opposite case, i.e., only AP-initiated downlink transmissions, using DRL-based scheduling . Perstation channel quality and traffic information are the inputs for different objective policies. Results confirm the potential of ML for scheduling in OFDMA systems. Kotera et al. [145] use a deep deterministic policy gradient (DDPG) algorithm to solve the OFDMA resource allocation problem formulated as a Markov decision process (MDP) using Lyapunov optimization. Results show that this solution can meet the system latency requirements in situations where other baseline solutions fail, while also improving fairness.
Additionally, Sangdeh and Zeng [144] address joint MU-MIMO and OFDMA optimization by using deep supervised learning (DSL). The solution, called DeepMux, is executed at the APs and relies on DNNs to minimize the impact of channel sounding and find a near-optimal resource allocation policy. Experimental results show gains of up to 50% in throughput using DeepMux.
Importantly, OFDMA-based channel access is a common feature for both Wi-Fi and 5G and it was adopted in WLANs after it was successfully applied in the cellular domain. In the following papers, ML addresses several problems in OFDMAbased cellular networks: fair scheduling [173], [174], carrier frequency offset (CFO) estimation for uplink transmissions [175], [176], inter-network interference control [177], and resource allocation [178]. These works implement RL [173], [174], [177], supervised deep learning [175], unsupervised deep learning [176], and a genetic learning algorithm [178] to support performance optimization. We believe that these papers may provide interesting insights and guidelines for researchers working in the Wi-Fi domain.

C. Spatial Reuse
The IEEE 802.11ax amendment first introduced spatial reuse (SR) to Wi-Fi networks [8]. The main objective of this mechanism is to support concurrent transmissions between devices that belong to different basic service sets (BSSs). When a device detects an ongoing transmission, it must first decide whether another concurrent transmission is possible, and in case it is, which transmission power to use to avoid disrupting the ongoing one. IEEE 802.11ax SR offers good performance gains, despite its conservative rule-based design. In such a context, ML techniques make such a mechanism adaptive to different scenarios, and decide when and how a device detecting an ongoing transmission can benefit from a spatial reuse opportunity, should result in even higher throughput and latency gains. It is expected that IEEE 802.11be will further extend Wi-Fi SR capabilities by allowing neighboring APs to coordinate their transmissions. ML techniques can contribute to improve coordinated SR by solving the challenge of identifying which devices can transmit at the same time by combining the collected CSI information from different devices. Figure 10 shows the case of two neighboring APs that, empowered by ML agents, find a suitable configuration that gives them the highest possible throughput while sharing spectrum resources. In this case, we assume both prefer to use the same 80 MHz channel but transmitting at lesser power. With this configuration, the APs maximize mutual spatial reuse Figure 10. An AP empowered with ML can learn from experience which is the best SR configuration to maximize its own, or the overall network, performance.
opportunities in front of other options such as using nonoverlapping channels (less bandwidth) or transmitting at high power (with higher MCSs) but causing the other BSS to defer.
The use of ML solutions to tackle the SR problem has raised some attention in recent years. Most of the works implement RL techniques for learning the best configuration for each ML agent-empowered AP online. Popular methods include Q-learning [146], [153] and MABs [147]- [151]. All these papers share the concept of multiple agents that either do not share information or they share it only partially (i.e., the action performed and the obtained reward) and learn by interacting through the environment. The results show that in multi-agent scenarios where the agents compete with each other without collaborating, convergence may be hard or impossible to achieve. There are also papers using SL techniques, such as NNs [152], [179], to select proper SR parameters (transmission power and sensitivity levels) given that the characteristics of the scenario are known.
In the following, we overview some of these papers, as they are illustrative to understand how ML can improve SR operation in Wi-Fi. Timmers et al. [146] use a Q-learning algorithm to optimize power, transmission rate, and clear channel assessment (CCA). States are defined as a combination of transmission power, interference, and the MCS used, and actions consist of changing the transmission power and MCS. Agents are placed at every device and act selfishly. Yin et al. [153] use Q-learning to improve 802.11ax's SR mechanism: the agent learns the best decision (i.e., transmit concurrently or wait) assuming knowledge of current interferers. For non-stationary scenarios, the learning rate of the less frequently chosen actions is increased to ensure rapid adaptation to environmental changes.
Wilhelmi et al. address the problem of channel selection and transmission power allocation with a stateless Q-learning solution [148] and different MABs action-selection strategies (ε-greedy, EXP3 [180], upper confidence bound (UCB) [181], and TS [182]) [150]. Two approaches are also examined: 1) concurrent -all networks take actions simultaneously, and 2) sequential -only one network changes its configuration at a time. Results show that optimal proportional fairness is achieved even if the different networks operate selfishly (i.e., they aim to maximize their throughput) without sharing information. Meanwhile, sequential action taking between actors reduces the throughput variability at the different BSSs. However, this comes at the expense of lower throughput values. Wilhelmi et al. [149] also use MABs for improving decentralized SR decisions. If the different ML agents can communicate and share the performance obtained when playing a certain action, it is possible to apply utility functions in the online optimization process that directly target network fairness, such as max-min, effectively reducing cases where some BSSs are starved due to the selfish operation of others. Meanwhile, Bardou et al. [154] also consider MABs in a centralized solution to dynamically change spatial reuse parameters. The reward function prevents starvation by using TS to select the best configuration. The dimensionality problem is solved by subsampling the state space. Simulation results show the ability of this solution to improve the performance of dense WLANs with multiple interacting BSSs.
Supervised learning techniques such as multilayer perceptrons (MLP) and DTs are considered by Ak and Canberk [152] to select SR parameters at both the AP and stations. The models are trained offline using a dataset that covers multiple scenarios and configurations. A different approach is considered by Jamil et al. [179], where a centralized NN configures all BSSs so that spatial reuse is maximized. The NN considers the correlation function between the throughput achieved by the different devices in the network and their associated link layer parameters.
Interference can also be mitigated by jointly optimizing the transmitted power of APs and the channel allocation policies [108]. A Q-learning model maximizes throughput in dense WLANs. The model is trained through a learning process of reduced total iterations driven by an event-triggered mechanism. Whenever the network status changes due to the mobility of users, the learning process is called again to optimize power and channel allocation policies. Results are derived based on the deployment of 15 APs, where a 16 % throughput improvement is obtained in comparison to SoA power and channel allocation mechanisms.
Lastly, a completely different approach to achieve SR with directional transmissions is taken by Nguyen et al. [147]. The selection of the antenna orientation is tackled as a nonstationary MAB problem. Results from an software-defined radio (SDR) implementation show the correct operation and resilience to co-channel interference.

D. Channel Bonding
The option to enable channels wider than 20 MHz was introduced in IEEE 802.11n, where up to 40 MHz channels were supported. The IEEE 802.11ac and IEEE 802.11ax amendments further increased the maximum channel width to 80 and 160 MHz, respectively. IEEE 802.11be will continue to increase the channel width, with up to 320 MHz channels. Wider channels allow higher transmission rates and therefore higher performance. However, in dense scenarios, it may notably increase contention between neighboring BSSs, which may cause the opposite result. Therefore, correctly deciding Figure 11. An ML-enabled AP learns which is the best set of actions that maximize its performance. The primary channel is identified as P. Grey rectangles illustrate idle channels, red -busy channels, green -AP transmissions.
when to use a wider channel, what should be its size, and which particular channels to use is necessary for successfully improving WLAN performance. Unfortunately, there is no single answer to the previous question but rather it depends on each specific scenario, including the number and position of contending devices, the load of each BSS, and the available channels.
ML techniques can solve such a situation by learning the best channel allocation and bonding configurations in a given scenario. Online learning seems a natural option in this case, especially if RL techniques and prediction models are combined to foster a rapid convergence [165]. For example, Figure 11 shows the case where an agent learns from experience which actions to perform given that the environment is found in a particular state (i.e., the state may be defined by the occupancy of the different 20 MHz channels) every time the primary channel becomes idle. In this case, the agent has learned that the best action when all four 20 MHz channels are idle is to transmit in the first 40 MHz primary channel, but not in the secondary 40 MHz channel. Similarly, when the secondary 40 MHz channel is busy, the AP has learned the best action is to wait until the 40 MHz secondary channel becomes idle to perform an 80 MHz transmission.
Out-of-the-box MABs mainly decide which are the best channel widths to be used when no further information, neither from the network nor from user requirements, is considered. The goal is to maximize WLAN performance [156], [157], [162]. When traffic loads and other performance metrics are considered, such as delay and throughput, DRL techniques are successfully applied [158], [159].
Karmakar et al. [156] show that the default dynamic channel bonding operation is improved by considering the individual needs of each station, as well as the access category (AC) they are using, selecting the most appropriate channel widths to use. With that goal in mind, a MAB algorithm, UCB, learns when the use of secondary channels is required. Testbed results show that this solution provides gains higher than 100% in some cases. Similarly, Khan and Lehtomäki [157] apply learning from a trial and error perspective (i.e., exploring) which are the best channels and bonding strategies to use, including both contiguous and non-contiguous 20 MHz channels. The mechanism, called iterative trial and error (ITE), includes different states depending on both the actions taken and the reward obtained. Exploration is implemented in ITE using an ε-greedy strategy. The mechanism is implemented in WARP nodes. Results show that ITE outperforms the default ε-greedy mechanism and improves the performance of static bandwidth channel access (SBCA) and dynamic bandwidth channel access (DBCA) thanks to its availability to select the channel width properly. Lastly, Ayush et al. [155] introduce hybrid adaptive DBCA (HA-DBCA) to solve the starvation problem that affects some DBCA devices.
HA-DBCA uses a polling-based adaptive mechanism for contention-free access and UCB to identify the stations that are starving, and so allow them to transmit their data during the contention-free access. The channel bonding problem is also modelled as a MAB by Kanemasa et al. [162]: chaotically oscillating waveforms generated by semiconductor lasers guide the exploration of the different available actions. Then, dynamically adapting the different thresholds used to select one or another action based on the amplitude of the generated waveform at sampling instants shows that such a technique can outperform default MABs such as UCB and ε-greedy in terms of throughput. Finally, Barrachina-Muñoz et al. [166] justify model-free RL techniques to address the channel bonding problem, design a complete RL framework and call into question whether complex RL algorithms allow rapid learning in realistic scenarios. Through extensive simulations, results show that a stateless RL in the form of lightweight MABs is an efficient solution for rapid adaptation, avoiding the definition of broad and/or meaningless states. In summary, lightweight MABs are an appropriate alternative to the cumbersome and slowly convergent methods such as Q-learning, and especially, deep reinforcement learning.
DRL is considered for configuring channel bonding. Qi et al. [158] address the channel allocation problem (i.e., group of selected channels and position of the primary channel) in a scenario with multiple BSSs. The channel allocated to each BSS should depend on its expected load and performance. Then, considering the goal of minimizing latency, a on-demand channel bonding (DCB) algorithm that uses DRL, along with a multi-agent deep deterministic policy gradient (MADDPG) for training, to find suitable channel allocations. Results show that by reducing the channel width in APs with low traffic demands, the delay in the overall network is improved as the channel access contention is reduced. A similar problem is considered by Luo and Chin [159], where DRL tackles the channel assignment problem in WLANs with channel bonding while considering spatio-temporal changes in traffic demands. Therefore, the DRL solution (i.e., a DQN) learns to adapt to offer satisfactory service. The agent in each AP learns from historical traffic loads when more or fewer channels should be bonded together, trying to minimize the interactions with other BSSs when not required. An opportunistic contiguous and noncontiguous channel aggregation scheme for 802.11ax WLANs is presented by Han et al. [161]. Since the default strategy of aggregating all available channels may degrade network performance due to the inter-WLAN contention, an efficient probabilistic channel aggregation scheme should consider the traffic load of secondary channels. To adjust aggregation probabilities of the secondary channels, a DRL strategy is used. Results confirm that this strategy outperforms others based on predefined rules such as aggregating all channels and aggregating one or two channels randomly selected.
The problem of throughput prediction in dense WLANs supporting channel bonding is considered by Wilhelmi et al. [163], where several predictors are built using SL techniques that include ANNs, graph neural networks (GNNs), RF regression, and gradient boosting. Both training and validation are performed on an open dataset generated using the IEEE 802.11ax-oriented Komondor network simulator [183]. While the accuracy achieved by the methods demonstrates the suitability of ML for predicting the throughput performance of complex WLANs, more importantly, this work can be easily extended by considering other approaches. The same dataset is used by Soto et al. [164] to predict Wi-Fi performance using a GNN model that incorporates the deployment's topology information. Finally, the problem of collisions with hidden stations when channel bonding is used is described by Karmakar et al. [160]. APs use a recursive neural network, namely a Metropolis-Hastings generative adversarial network (MH-GAN), to predict the activity of neighbouring BSSs. Results confirm that the presented solution, called Smart Bond, can reduce the probability of suffering transmission errors due to hidden stations.

E. Multi-link Operation, Network MIMO, and Full-duplex
ML techniques also improve the operation of a wide variety of advanced mechanisms that include multi-band WLAN operation [167], multi-AP coordination for network MIMO [168], and in-band full-duplex [169]. Both RL and SL techniques are used. For example, DRL considers both channel allocation and AP clustering to maximize the performance of distributed MIMO transmissions [168]. Similarly, NNs predict channel states and so improve the performance of multi-band WLANs [167] and to find groups of stations that enable full-duplex communication at the APs [169]. In the following, we overview these solutions in more detail. Figure 12 shows the case where an ML-enabled multi-band AP has to decide how to distribute traffic to a given station between the 5 GHz and 6 GHz bands. The agent predicts channel occupancy values on both bands to decide how to use the two interfaces better. For instance, if the occupancy predicted at the 5 GHz band is high, it may choose to turn off such an interface and use only the one working at the 6 GHz band, thus saving energy.
1) Multi-link Operation: Multi-link operation enables a WLAN device to simultaneously tranmit over multiple interfaces on the same or different bands. This feature is currently under development in the IEEE 802.11be task group. In the synchronous version, when any of the active backoff instances reach zero for a given interface, the state of the other interfaces is checked, and those idle are bonded together to support the subsequent transmission. However, if other interfaces are currently busy but some may become idle soon, it may be more efficient to wait and then aggregate these links instead of immediately transmitting using only a single interface. Yano et al. [167] solve this uncertainty by learning and predicting when a given interface will become idle using a probabilistic neural network (PNN). Multi-band operation is also considered by Wang et al. [170] in combination with HARQ to improve packet retransmissions' efficiency. SL decides if a packet retransmission should use the same band as the previous transmission, a different one, or all available bands simultaneously by sending multiple copies. A K-means algorithm and a DNN discern which is the best operation mode. Results show that a more efficient network utilization is obtained, leading to higher throughput values.
2) Network MIMO: Krishnan et al. [168] consider the joint problem of channel allocation and AP clustering in distributed MU-MIMO for Wi-Fi networks. DRL solves these two problems by maximizing per-user throughput. Since both underlying problems are NP-hard, only heuristic solutions exist in the literature. The DRL framework consists of an agent, implemented using a DNN, and a distributed MIMO Wi-Fi simulator. Although not explicitly specified, the solution is implemented in a central controller. Results show that using the DRL framework, a 20% improvement in user throughput is achieved. Also, the DRL framework can attain multiple objectives, such as maximizing throughput and fairness simultaneously.
3) Full-duplex Communication: Full-duplex (in-band) communication allows a device to transmit and receive simultaneously, thus 'doubling' the channel capacity. In WLANs, a key challenge to solve is the user pairing problem: finding groups of different stations that allow the AP to transmit to one while receiving from another. To solve this combinatorial problem, which becomes impractical when the number of stations is high, Zhang et al. [169] use a DSL architecture introduced in [184]. The main benefit of this solution is that the NN does not need to be re-trained when the length of input (e.g., the number of users) changes within an expected range. Results confirm that the DSL-based solution outperforms two low-complexity methods called greedy assignment and random assignment.

F. Open Challenges
This section has covered recent and advanced Wi-Fi features such as beamforming, multi-user communications, channel bonding, spatial reuse, and multi-band. Although quite different, in all of them, ML techniques are used mainly either: • to adapt to the environment through selecting the most proper actions at the right moment, • for system-level performance predictions, or • to improve the operation of specific mechanisms by completing unavailable data. Since most of these features are recent, complex, and in development, many aspects are still not considered or considered only superficially. Therefore, there is room for future work in this area either by addressing the problems listed in previous subsections with different ML techniques or by simply picking some of the still uncovered aspects. In the following, we detail some open aspects in the different categories.
First, the success of using beamforming in indoor Wi-Fi scenarios will be based on the ability to properly perform beam sector alignment (Figure 8). Research has shown that ML methods can have a positive impact, but robust solutions available for COTS devices are required, e.g., to minimize latency [118]. For outdoor scenarios, beamforming-aware resource allocation (intra-AP) and resource coordination (inter-AP) methods based on ML need to be updated to the recently released 802.11ay amendment, where FWA is an important use case, which has so far not been researched in depth.
In the area of multi-user communication, more works focusing on ML solutions for allocating spatial streams and RUs to active stations are required, especially when mixed with realistic traffic patterns and QoS requirements. In terms of future traffic estimates, contending devices and environmental conditions may improve the Wi-Fi response to sensitive traffic, improving criteria such as worst-case latency by pre-reserving resources. Moreover, predictions using ML techniques can improve how channel sounding is implemented, as only stations that are likely to be scheduled will be requested to provide such information.
Many works in the area of spatial reuse have considered BSSs operating in a completely decentralized way, so using a spatial reuse opportunity depends only on each individual's observed inputs. This situation justifies that many papers have considered ML techniques such as MABs or Q-learning to infer which is the best action in a particular situation. However, with IEEE 802.11be, TXOP sharing and cooperative schemes may be enforced, thus requiring a different approach, augmented with ML techniques to optimize its operation.
The case of channel bonding has been addressed using RL and SL. Both techniques capture the interactions between BSSs that appear when channel widths change dynamically. Further work is required to test and compare these results with each other. Furthermore, an exciting aspect is to couple channel aggregation techniques with OFDMA RU allocation, for which complex DRL techniques may be well suited.
Finally, a disruptive new feature introduced by IEEE 802.11be is multi-link operation. This will open several exciting challenges, such as which channels to use and how to distribute the different flows between links. ML techniques can learn, for example, when is the best moment to perform a channel switch, which link occupancy patterns favor a particular traffic pattern, and how to allocate or distribute flows to links.

V. CONNECTIVITY MANAGEMENT
Connectivity management is an important task in Wi-Fi networks that includes, among others, channel allocation, band selection, and AP selection. The task is complex and challenging as the configuration change of a single link affects not only its performance but often (e.g., in densely deployed networks) the performance of all neighboring networks. In this section, we present ML-based approaches for solving the connectivity management subtasks. Moreover, we cover also ML-based approaches for predicting future traffic load as well as the health of Wi-Fi link connections. These techniques allow to prepare and update the network configuration in advance (e.g., before a rapid change of communication conditions), which helps to minimize outage probability and improve user QoE. Table IV presents a summary of works augmenting Wi-Fi with ML in this area, which we present next.

A. Channel and Band Selection
Channel allocation is an important problem in dense Wi-Fi networks, where a limited set of available channels has to be shared by a large number of co-located Wi-Fi BSSs. Poor channel allocation causes substantial contention among the APs and stations, hence reducing the throughput of each station. Typically, in the proposed solutions, the research goal is to assign channels in a way that • the APs using the same channel do not interfere with each other (e.g., they are out of each other's interference range) and/or • highly loaded APs are not allocated the same channel (i.e., a form of load balancing). Note that in the case of variable traffic load, channel allocation has to be performed periodically.
As depicted in Figure 13, ML-based algorithms can solve the problem of channel and band selection. They provide models that may consider changing interference relations (e.g., due to station mobility) and variable traffic loads (e.g., as a result of stations becoming active or passive).
Nakashima et al. [195] use a DRL-based channel allocation scheme to maximize throughput in densely deployed multi-BSS WLANs. A central controller is aware of the global system state and able to control all APs. The interactions between APs, under a certain channel allocation, are represented through contention graphs (i.e., channel adjacency matrices) to extract the features of carrier sensing relationships among the APs (i.e., topology information) using graph convolutional networks (GCNs). The learning algorithm is DDQN with a dueling network and prioritized experience replay. In addition, to prevent overfitting, selective observation data buffering is used, i.e., experiences are filtered to reduce the duplication of data for learning, which can often adversely influence the generalization performance. The simulation results demonstrate that the method enables the allocation of channels in densely deployed WLANs such that the system throughput increases.
Jeunen et al. [189] introduce a framework able to passively monitor dense Wi-Fi environments, compute overlapping airtime periods, and detect networks which are the main cause of performance degradation in a WLAN. A centralized (SDN-based) network architecture is assumed. To implement the framework, different ML techniques are used, e.g., least absolute shrinkage and selection operator (LASSO), ordinary least squares (OLS). The extraction and ranking of relevant features from the gathered data is done using, e.g., label propagation algorithm (LPA) and Girvan-Newman algorithm (GNA). Results show that the presented framework can find a new channel allocation that solves the interference problems.

1) AP Selection and Association:
The proliferation and densification of Wi-Fi networks often lead to the existence of multiple spatially overlapping Wi-Fi cells. Hence, a station has to choose which of the discovered APs to connect to. The association method of 802.11 has stations select the AP that provides the strongest signal. Unfortunately, in many cases, this simple approach leads to the under-utilization of some APs while overcrowding others. Consequently, AP selection and load balancing approaches have been extensively studied as a way to improve network throughput. For example, Carracscosa et al. [191], [197] use a decentralized AP selection procedure where stations employ an MAB-based approach to dynamically learn the optimal mapping between APs and stations. This procedure distributes the stations evenly among the available APs. Specifically, each station independently explores the different APs inside its coverage range and selects the one that better satisfies its needs. A novel opportunistic ε-greedy approach with stickiness halts the exploration when a suitable AP is found. Then, the station remains associated to that same AP while it is satisfied, only resuming the exploration after several unsatisfactory association periods. Results show that this approach increases the number of satisfied stations and the aggregated network throughput by up to 80% in the case of dense AP deployments.
Similarly, López-Raventós and Bellalta [196] study MABbased solutions for the decentralized channel allocation and AP selection problems in enterprise WLAN scenarios. APs and stations use agents that, through a Thompson sampling algorithm, explore and learn: • at the AP side: which is the best channel to use, and • at the station side: which is the best AP to associate with. Results from a custom-built simulator, called Neko 4 , show that the learning-based approach outperforms the static one, regardless of the network density and traffic requirements.
Bojovic et al. [186] propose a cognitive AP selection scheme, where a station selects an AP that is expected to yield the best throughput according to past experienced performance. The scheme belongs to the SL family and uses a multilayer feed-forward neural network (MFNN) to learn the correlation between the observed environmental condition (e.g., SNR, probability of failure, beacon delay) and the obtained performance (i.e., throughput). The results from an 802 .11 testbed show that the approach effectively outperforms legacy AP selection strategies in a variety of scenarios. Liu et al. [185] and Karaca and Landfeldt [212] use a similar approach of predicting performance under AP selection constraints.
An interesting RL-based scheme of user-to-multiple AP association is presented by Dinh et al. [198]. Two distributed association methods based on deep Q-learning (DQL) enable stations to learn their best set of APs to connect to, using only local knowledge of the wireless environment or with a limited feedback from the APs. Note that each device is equipped with multiple wireless interfaces. The objective is to maximize the long-term sum-rate subject to multiple constraints (e.g., AP load or application QoS constraints). A numerical evaluation reveals that the algorithms improve the targeted objectives and enhance fairness among applications.
A centralized approach is proposed by Kafi et al. [192]: an RL-based client-AP association algorithm to enhance the aggregated throughput in dense Wi-Fi networks. The Qlearning-based algorithm is deployed centrally in an SDNcontroller and controls the associations of new users, as well as performs re-associations of connected stations. As simulation results show, the approach outperforms the standard 802.11 association procedure when the distribution of users is not uniform and performs similarly when it is uniform.
Pei et al. [188] determine, through large-scale measurements, which factors affect the Wi-Fi connection set-up process. The analysis of 0.4 billion Wi-Fi sessions collected using the Wi-Fi Manager mobile app from 5 million mobile devices shows that 45% of Wi-Fi connection attempts fail and about 5% of attempts consume more than 10 seconds. Based on this analysis, the developed SL-based AP selection algorithm significantly improves Wi-Fi connection set-up performance. The algorithm uses RF to classify candidate APs into slow or fast sets by taking the following features as an input: hour of the day, received signal strength indicator (RSSI), mobile device model, AP model, encryption enabled. Based on the classification, a station avoids connecting to APs in the slow set. The evaluation results show that the described approach reduces connection failures to 3.6% and improves the connection set-up time over 10 times.
As shown by Song and Striegel [187], frame aggregation offers an efficient representation of expected throughput for 4 https://github.com/wn-upf/Neko improving AP selection. Specifically, the characteristics of subframes during frame aggregation can uniquely embody the utilization, interference, and backlog traffic pressure for an AP. With an SL-based approach, simple regression models (based on linear regression and DT regression) predict the AP expected throughput for better AP selection. The results show a prediction accuracy above 80%.
2) Station Handovers: In mobile scenarios, it frequently happens that a station leaves the coverage area of one AP and enters an area covered by another AP. In such a case, the station has to perform a handover from the old AP towards the new AP. The decision about a potential handover operation should be made early enough to avoid low data rate periods or even connectivity outage. ML methods can predict network conditions and hence make correct handover decisions. For example, Feltrin and Tomasin [190] employ ML to predict upcoming handover by making an AP monitor the RSSI of connected stations and use a NN for specific pattern recognition in the RSSI evolution. This technique provides good prediction accuracy and is resilient to noise, speed, and fading phenomena.
ABRAHAM (mAchine learning Backed multi-metRic Handover AlgorithM) [194] is an ML-based proactive handover algorithm that uses multiple metrics to predict the future location of stations and the future AP load. Additionally, using long short-term memory (LSTM), it predicts the future RSSI values. These predictions are used to optimize the load on the APs by handing over stations to APs to preserve QoS and QoE metrics. ABRAHAM achieves 139% higher overall throughput compared to the legacy 802.11 handover algorithm.
Han et al. [193] describe a handover management scheme for dense WLAN networks, which uses DRL, specifically a deep Q-network. The scheme enables the NN to learn from user behavior and network status, adapting its learning in timevarying dense WLANs. The handover decision is modeled as an MDP leveraging the temporal correlation property, while the scheme depends on real-time network statistics to make decisions. Simulation results show that this solution can effectively improve the data rate during the handover process and outperform the 802.11 handover scheme.

B. Management Architectures
Management of Wi-Fi networks is a complex task as it requires tuning a plethora of parameters across distributed devices. Here, we present management frameworks that facilitate this task by providing an AI-based control plane.
aiOS [77] is an AI-based operating system for SD-WLANs (i.e., the control plane). This system embeds state-of-the-art ML toolboxes to provide a global intelligence platform, which is at the same time driven by AI and designed to drive future AI-powered applications and services. A proof-of-concept implementation of aiOS validates it by using several lowcomplexity ML models for adaptive frame length selection in 802.11-based SD-WLANs. The approach improves the aggregated network throughput by up to 55% as evaluated in a real-world testbed.
Bast et al. [199] use DRL to dynamically optimize network slice configuration in Wi-Fi networks. A slice configuration consists of multiple parameters, e.g., CCA sensitivity level, MCS, and transmit power level. Therefore, the action search space grows with the number of active slices in the network. Interestingly, in the approach the selected action does not consist of absolute configuration values, but the increasing or decreasing current parameter values. A simple DQN agent is enhanced with DDQN, experience replay, and fitted Qlearning to improve convergence speed and stability. Results from the ns-3 simulator show that the solution can achieve the same optimal performance as found with an exhaustive search. Finally, DDQN can optimize at run-time, without the need for AP deployment information or knowledge about coexisting networks.
Lyu et al. [200] use large-scale AP usage data from a university campus Wi-Fi system with over 8,000 APs and more than 40,000 active users. An extensive spatio-temporal analysis of the data set includes the following: • AP load, i.e., the number of associated users, • AP traffic throughput, i.e., the amount of traffic consumption within a period. A so-called idle phenomenon prevails throughout the whole trace. Specifically, multiple APs remain unused (i.e., without any user association). Second, the AP load follows a long tail distribution (i.e., most APs serve only a few users, while a small number of APs serve hundreds of users), hence, the per-AP utilization is imbalanced. Therefore, a new management system, named LAM (large-scale AP management), has the unused APs switched off intelligently according to the underlying user association conditions. LAM leverages an ML-based algorithm to predict the AP load over time based on historical AP association records. Results for diverse algorithms (including RF, SVM, kNN, and DT) show that the load prediction accuracy can reach as high as 90%. In addition, more than 70% of power energy is saved, with over 92% of Wi-Fi coverage guaranteed. These savings translate to $59,000 per year in the aforementioned university Wi-Fi system.
An SDN-based Wi-Fi control system is considered to manage a group of APs by Patro and Banerjee [213]. The central controller configures channel and transmission power settings for the APs in the network. Decisions on how to configure the network are taken after learning from the collected data. A set of ML-based techniques are used, for example, reduced error pruning trees (REPTs) -to predict Wi-Fi and non-Wi-Fi activity (such as microwave ovens) so that better configurations can be deployed. The framework reduces channel congestion by up to 47%.

C. Traffic Prediction
Network management operations are assisted by traffic prediction techniques for better short-and long-term planning. Proper planning, using methods such as traffic forecasting, congestion control, power saving, bandwidth allocation, and buffer management, leads to improved user QoE. For instance, based on the predicted traffic, APs can improve load balancing and admission control.
Real-time traffic prediction becomes a challenging problem in Wi-Fi networks due to varying channel conditions, changing network topologies, and random user traffic. Traffic estimation is also dependent on several other parameters, such as the total number of users in the network, the SNR on the link, or the communication capabilities of users and APs [203]. In such scenarios, ML models deal with the diverse conditions of Wi-Fi networks, otherwise intractable through analytical methods. As summarized in Figure 14, ML models augment legacy 802.11 devices through SVM [201], RNN, MLP, SVR, and polynomial regression [202], or DT and RF [203].
Specifically, the solution proposed by Feng et al. [201] trains an SVM to predict the traffic evolution one step ahead. Besides, by recursively applying the one-step-ahead solution, traffic estimation for l-step-ahead is also conceived. The SVM model is implemented as a Gaussian radial basis function and trained with 100 samples to predict the next 100 samples. Through the SVM model, the error to predict the upcoming traffic is reduced at least by 33 % when compared to the performance of the ANN.
Khan et al. [203] analyze the most suitable ML models to predict traffic among MLP, SVR, DT, and RF. To train these models, several features are extracted from simulation and real data (i.e., Wireshark network trace). In particular, the number of connected users, signal strength, modulation scheme, data rate, inter-arrival time, packet arrival rate, number of re-transmissions, and several other channel parameters are extracted. The solutions are implemented in a Wi-Fi network consisting of 10 users and a single 802.11 AP. The reported prediction accuracy presents a maximum value of 96.2 %, 94.5 %, 93.3 %, and 91 % using MLP, DT, RF, and SVR, respectively. The study also analyses the complexity of these mechanisms in real-time schemes by reporting the time elapsed for each model. The highest time-consuming model is MLP followed by RF, SVR, and DT.
Finally, Thapaliya et al. apply both SL and USL models [202] to predict network congestion levels. Based on captured data attributes (the number of clients, throughput, frame retry rate, and frame error rate), SVR and polynomial regressor models predict the same values for a certain location, day, and time. These predicted values are then fed to an expectation maximization (EM) algorithm to predict congestion levels by forming three different clusters. Each cluster is identified with high, medium, and low congestion levels based on the numeric value of the clustered samples. The obtained accuracy is 24 %, 50 %, and 26 %, for a low, medium, and high level of network area congestion.

D. Predicting the Health of Wi-Fi Connections
Unlicensed bands are becoming crowded with dense and uncontrolled deployments of Wi-Fi networks, generally managed by different users. These environments exacerbate the effects of well-known pathological conditions such as hidden terminals, flow starvation, and performance anomaly. Unfortunately, these problems become increasingly difficult to detect in real world scenarios. Specifically, while performance degradation is a common symptom of these unwanted conditions, different causes require different solutions. ML seems to be the right toolset for the detection of individual impairments, as it can handle a large amount of raw measurement data and learn to deduce the current operation regime (e.g., using classification methods). Therefore, Gallo and Garlisi [205] provide an automatic diagnostic tool, Wi-Dia, for detecting the causes of performance impairments by recognizing the wireless operating context. Wi-Dia follows a data-driven approach and exploits ML methods for classifying Wi-Fi problems (e.g., hidden stations and flow starvation). It uses features related to network topology and measures channel utilization without impacting regular network operations. The classifier is jointly trained using simulated and experimental data, thus taking advantage of the flexibility of network simulators and the realistic details of wireless testbeds. As the results show, Wi-Dia achieves high detection accuracy of pathological Wi-Fi conditions in real-world scenarios.
Similarly, Syrigos et al. [206] detect the causes of Wi-Fi under-performance, e.g., high contention with other Wi-Fi and non-Wi-Fi devices, operation in low SNR region, hidden terminal, or capture effect. A centralized Wi-Fi network controller collects two performance metrics from connected APs (i.e., those exposed by the ath9k driver): • normalized channel access (NCA), i.e., the ratio between channel access attempts per second and the maximal channel access attempts per second as calculated with analytical 802.11 models); and • frame delivery ratio (FDR), i.e., the ratio between successful transmissions per second and channel access attempts per second. The classification is preceded by data modeling and feature extraction and performed with four diverse algorithms: DT, RF, SVM, and kNN. After fine-tuning the algorithms' parameters, the results show a remarkable detection accuracy of 99.2% with the kNN algorithm.
Trivedi et al. [207] propose WiNetSense, a centralized sensing framework, which collects the Wi-Fi link quality statistics (e.g., RSSI) from network devices and uses this information to build the global network topology and instantaneous network health information. Furthermore, the collected data is analyzed using ML algorithms such as kNN and naive Bayes (NB) to predict the health of wireless links. This knowledge can trigger specific decisions regarding load balancing, smooth handovers, or dynamic power control.
An anomaly-detection approach that uses a self-organizing hidden Markov model map (SOHMMM) is considered by Allahdadi et al. [208]. The self-organizing map is an artificial NN that is trained through a USL process. The SOHMMM shows improved anomaly detection accuracy and sensitivity, compared to other HMM-based approaches, as tested in a simulated environment.
Morshedi and Noll propose a novel ML-based approach for estimating the perceived QoS of video streaming [209] and video conferencing [210] using only 801.11-specific network performance parameters collected from AP. The studies use datasets comprising 802.11n/ac/ax specific network performance parameters in the form of mean opinion scores. These datasets train multiple ML algorithms, i.e., logistic model tree (LMT), REPT, Naive Bayes Tree (NBT), MLP, and achieve a 93-99% accuracy of estimating the perceived QoS classes. LMT and REPT are the most suitable algorithms to estimate the perceived QoS of video streaming and conferencing, respectively, in terms of accuracy, interpretability, and computational cost criteria. Additionally, the generated ML model can be transferred to an AP as a lightweight script to continuously monitor QoS.

E. Open Challenges
While most of the presented ML-based solutions for crossnetwork optimization (e.g., channel allocation) feature centralized operation, we believe that distributed approaches are better suited for the unplanned and random nature of Wi-Fi deployments. Moreover, we cannot assume the existence of a centralized controller that manages co-located but separately owned Wi-Fi networks (e.g., in typical residential Wi-Fi deployments). Note that the potential operation of such a central controller poses a significant privacy threat, as it might require the collection of sensitive user data (e.g., the traffic volume of individual stations). Therefore, we argue that there is an increasing need for research in the scope of a decentralized and distributed ML-based optimization. Particularly, multi-agent RL-based schemes seem to be a fit: a set of agents (e.g., one at each AP) interact and share limited information to collaboratively optimize wireless resources while also preserving privacy. Finally, an open management challenge that has not yet been addressed using ML methods is improving the coexistence of modern and legacy Wi-Fi devices [214].

VI. COEXISTENCE SCENARIOS
The coexistence of Wi-Fi and cellular technologies is currently a popular and attractive research area 5 . These technologies are already advanced and their newest generations provide peak data rates in the order of Gbit/s. However, under coexistence scenarios in unlicensed bands (e.g., with LTE-LAA), they still rely on rather primitive coexistence schemes based on energy sensing and hence suffer from frequent collisions and significant throughput degradation of up to 90% [267], [268]. The coexistence schemes need to tackle the problem of heterogeneity of the underlying technologies: they implement different MAC and PHY, they are usually managed by separate operators, and they do not natively support intertechnology communication for spectrum sharing. Therefore, fair sharing of unlicensed radio resources is still an open challenge [44].
Several research papers address the problem of the coexistence of multiple radio access technologies (RATs) in unlicensed bands, e.g., [269]- [273]. In this survey, we describe only those which propose ML-based solutions (Table V). Both centralized and decentralized approaches are considered, together with both offline and online training. The proposed mechanisms appear in the following main areas: • fair channel sharing, • network monitoring, • signal classification, and • cooperative networking. The majority of the proposed mechanisms are based on reinforcement learning (mostly Q-learning) and deep supervised learning (mostly CNNs). Often, the ε-greedy policy is used for Q-learning to balance exploration and exploitation.
Most analyzed papers optimize LTE behavior (i.e., the newcomer to the unlicensed bands) so that Wi-Fi performance is not degraded [274]. In some cases, however, it is proposed that both technologies implement some sort of ML to improve the coexistence of both technologies. Figure 15 presents different approaches considered by researchers: from a central controller implemented for both technologies up to separate ML agents installed in LTE base stations (BSs) and Wi-Fi APs, which independently observe the environment (i.e., perform local observation) and take actions. Note that the state of the environment depends on the joint action of all agents, which may be unaware of individual decisions. Additionally, in the reviewed papers, typically only downlink LTE transmissions interfere with either uplink or downlink Wi-Fi transmissions, while LTE uplink traffic is scheduled in the licensed band.

A. Fair Channel Sharing with Cellular Networks
Several papers propose to adjust LTE-Unlicensed (LTE-U) behavior, by either a central controller or by distributed learning. Their main goal is to intelligently avoid interference with incumbent technologies, like Wi-Fi, as a solution to the problem 5 Channel sharing with other technologies is described in Section VII where, among others, we address sensor and vehicular networks. Additionally, we refer the readers to [38], in which different learning paradigms for IoT communication and computing technologies are surveyed, and to [37], in which ML-supported detection and identification of IoT devices is surveyed.  of the negative impact of periodic LTE transmissions on channel utilization and channel access fairness [275].
Many papers implement Q-learning and modify the duty cycle management (DCM) function, which is a part of the carrier sense adaptive transmission (CSAT) algorithm (Figure 16), or to the almost blank sub-frame (ABS) allocation mechanism [225], [276]. The ABS mechanism is normally used to avoid co-channel cross-tier interference in the case of heterogeneous cellular scenarios, e.g., in scenarios composed of macro and small cells (Figure 17). The main goal of these modifications is to improve coexistence and channel sharing efficiency by intelligently disabling LTE transmissions in certain subframes to allow Wi-Fi transmissions and outperform legacy DCM.
Centralized LTE-U/Wi-Fi channel access management is proposed in the following papers. Naveen and Amballa [243] model the traffic load of each system as an M/M/1 queue and Qlearning is used by a central controller to adjust the allocation of LTE subframes in the CSAT duty cycles. Kushwaha et al. [227] use an inter-RAT controller with Q-learning to improve Wi-Fi/LTE-U coexistence fairness by considering the Wi-Fi load. In particular, the controller selects the optimum subframe configurations out of the ones defined by 3GPP. Additionally, it reduces LTE-U subframe transmission power to limit interference to co-channel users and increase the overall Wi-Fi transmission CSAT duty-cycle mechanism augmented with ML  channel utilization. Similar approaches are used elsewhere: • an agent controls DCM to maximize LTE-U throughput while protecting Wi-Fi transmissions, based on observing Wi-Fi traffic demands and using DRL [230]; • a centralized RL-based DCM learns from measured interference [231]; and • a centralized Q-learning-based mechanism of blank subframe allocations improves the overall utility function, i.e., considering target Wi-Fi throughput and satisfactory LTE throughput and delay [218]. Decentralized channel access management for LTE-U/Wi-Fi coexistence is proposed in the following papers. Rupasinghe and Güvenç [215] use Q-learning for distributed control of duty cycle periods by LTE-U BSs, while considering the beaconing mechanism of 802.11n. Additionally, Haider and Erol-Kantarci [221] apply a Q-learning based listen before talk (LBT) for LTE-U downlink transmissions 6 . LTE-U devices are treated as secondary users that need to protect Wi-Fi transmissions. Therefore, LTE-U users for which defer periods increase in case of increasing Wi-Fi backoff timers (i.e., when Wi-Fi defer periods increase) are rewarded. Simulation results show improved throughput and decreased delay of Wi-Fi stations in comparison to legacy LBT. Wi-Fi protection is also considered by Bairagi et al. [219]. A virtual coalition formation game (VCFG) is used and an optimization problem is defined within each virtual coalition composed of Wi-Fi APs and LTE-U small base stations (SBSs) operating in the same unlicensed band. Then, (i) Kalai-Smorodinsky bargaining is used for fair timesharing between LTE-U and Wi-Fi and (ii) Q-learning is used for resource allocation for LTE-U. Each SBS maximizes the 6 LBT is commonly used in case of LTE-LAA, however, it was proposed in the literature also for LTE-U [277].  Figure 18. LTE-LAA LBT-based coexistence mechanism. RS denotes the reservation signal, which is typically used by the LTE-LAA devices to reserve the channel until the beginning of the next frame synchronization slot.

LTE DATA RS
sum of QoE for its users under the constraint of protecting Wi-Fi APs. QoE is measured in terms of the mean opinion score (MOS) which is mapped to the transmission characteristics of the following applications: web browsing, file downloading, and video streaming. This approach provides higher throughput for Wi-Fi than standard LBT. Lin and Yu [217] implement adaptive learning to improve coexistence fairness for LTE-U BSs. Bianchi's Markov model [278] is embedded in a sequential game to describe the contention nature of Wi-Fi networks. The time-slotted behavior of LTE-U devices is also modeled as a sequential game. These two processes are combined to form a Markov game. Each LTE-U BS serves as an agent and Wi-Fi networks are considered the environment to which the agents adapt. The proposed dynamic channel access control results in improved overall performance. Additionally, Athukoralage et al. [279] use regret-based learning DCM ON/OFF period selection to improve network coverage in case of natural disasters. This improvement is achieved by unmanned aerial base stations coexisting with ground Wi-Fi APs. The area throughput improves in comparison to fixed and Q-learning-based dynamic duty cycle selection. Furthermore, Q-learning-based multichannel operation is proposed by Su et al. [224]. LTE-U SBSs serve as agents to allow either independent or joint optimization of duty cycles for each channel. The mechanism ensures fairness and improves throughput for multi-channel Wi-Fi/LTE-U coexistence. Finally, Gao et al. [280] have LTE-U and Wi-Fi managed by separate SDN controllers which build decision trees. Per-technology controllers do not communicate with each other but only negotiate network sharing by playing a repeated game based on rank-order tournaments. An incentivebased approach negotiates the channel resources, i.e., there are prizes for allowing spectrum sharing and for asking the other operator for a favor. The simulation results show that it is possible to achieve harmonized coexistence of the two technologies.
Another group of papers addresses coexistence between Wi-Fi and LTE-Licensed Assisted Access (LTE-LAA), which in most cases involves the adjustment of parameters of the LBTbased channel access mechanism shown in Figure 18. Similar to the LTE-U case, most papers are based on Q-learning.
Only a single paper proposes centralized channel access management for LTE-LAA/Wi-Fi coexistence [226], in which Q-learning is used by mobile management entitymobile management entities (MMEs) implemented in the LTE core to adjust the LTE-LAA transmission duration to Wi-Fi traffic intensity.
Centralized collection of data regarding LTE-LAA and Wi-Fi systems by the LTE cloud wireless access network (C-RAN) is proposed to support MMEs. Other papers implement distributed Q-learning to: • optimize spectral efficiency of Wi-Fi/LTE-LAA coexistence [228], • scale CW parameters depending on the collision probability observed in each backoff stage by LTE user entitys (UEs), as opposed to the legacy hybrid automatic repeat request (HARQ) mechanism implemented in cellular networks [232], • select optimal TXOP and muting periods, i.e., provide opportunities for Wi-Fi transmissions, to outperform random and round-robin mechanisms [222], • adjust the TXOP duration of coexisting Wi-Fi and LTE-LAA systems based on buffered downlink data in APs and evolved Node Bs (eNBs) [216], and • select optimal channel and subframe numbers [236].
Xu et al. [216] assume that both Wi-Fi and LTE-LAA nodes have agents, which take actions (select a TXOP of 4, 6, 8, or 10 ms) and calculate rewards based on the target occupancy ratio. A different approach is considered by Han et al. [229], where a MAB improves LTE-LAA/Wi-Fi coexistence fairness under the assumption of both cooperative and non-cooperative networks. In both cases, the CW sizes are optimized for the two networks by using an online training technique and either throughput or the information on LTE's ON period of the other network as rewards. Furthermore, Tan et al. [220] use two-level distributed learning. At the primary level, Q-learning determines the optimal LTE transmission time in the unlicensed bands using either Wi-Fi or LTE-LAA. At the secondary level, stochastic learning is used for LTE-LAA channel access with the protection of Wi-Fi traffic. Meanwhile, Challita et al. [223] improve coexistence by combining a non-cooperative game with RL supported by the LSTM concept to model the self-allocation of resources by LTE-LAA SBSs. In particular, dynamic channel selection, carrier aggregation, and fractional spectrum access are considered for SBSs. Exponential backoff is used for Wi-Fi and non-exponential backoff is used for LTE-LAA (i.e., in each epoch a static CW is assumed, adopted from one epoch to another). This approach not only improves performance in terms of LTE's rates but also in terms of reducing disturbances in Wi-Fi's performance and achieving coexistence fairness with Wi-Fi networks and other LTE-LAA operators. Finally, Kishimoto et al. [236], use Q-learning for joint channel/subframe selection. Only LTE-LAA BSs perform learning and start with zero knowledge of neighboring Wi-Fi systems.
We expect that coexistence between Wi-Fi and New Radio-Unlicensed (NR-U) will gain a growing interest of the research community in the near future [44], [46], [240], [281], [282]. In one of the first ML-based works, Tang et al. [233] use Qlearning to adjust the timing of NR-U's ABSs to Wi-Fi's data transmissions to achieve higher throughput and better channel utilization in comparison to static ABSs allocation. In particular, an NR-U BS serves as an agent which listens to Wi-Fi network parameters and learns the data transmission rules of Wi-Fi stations. Another interesting work is by Hirzallah and Krunz [235], where a clustering-based MAB real-time algorithm runs on NR-U/Wi-Fi nodes to adapt sensing thresholds depending on network dynamics. The sensing threshold-adaptive devices employing ML do not harm neighboring legacy devices (with fixed sensing thresholds) and both Wi-Fi and NR-U throughput is improved in comparison to standard and random sensing threshold settings.
For a more generic coexistence setting, Yu et al. [234] address a DARPA challenge on "autonomous radios to manage the wireless spectrum." DQN is modified to adapt to wireless network behavior. Through centralized learning (at the gateway) and distributed execution (at the stations) it is possible to provide fairness in channel access when coexisting with other network types (like Wi-Fi).

B. Network Monitoring
Efficient network monitoring is a feature that can support inter-technology coexistence by predicting the number of contending stations/technologies, which can then guide RAT behavior adjustment.
Yang et al. [248] propose centralized monitoring. Offline DNN-based learning from real samples predicts the number of competing Wi-Fi and IoT devices in a given area. With the inference results as input, the gateway (connected to an IEEE 802.11 AP using an Ethernet link and to IoT IEEE 802. 14.5 stations over wireless links) predicts the number of transmitting devices for each technology using a handshake-based method on the primary channel. Next, the gateway selects Wi-Fi and IoT parameters to minimize inter-technology interference, e.g., the CW for Wi-Fi stations, the length of the contention access phase for the IoT stations, and the assignment of secondary channels for both technologies. Meanwhile, Ahmed et al. [250] install a cognitive monitoring module in each eNB to optimize LTE operation in unlicensed bands. The monitoring module is aware of the number of coexisting eNBs and APs. It uses an RF-based classifier to identify the environment state and select an appropriate scheduling and resource allocation scheme which optimizes LTE throughput without deteriorating the performance of Wi-Fi networks. Similarly, Galanopoulos et al. [244] use centralized Q-learning and double Q-learning to improve the unlicensed spectrum utilization for carrier aggregation of LTE-Advanced (LTE-A), while providing fair coexistence with Wi-Fi stations. eNBs learn the channel occupation time by Wi-Fi users and select the least occupied channels. This procedure is further optimized with double Qlearning, in which LTE-A transmission power is additionally adjusted to lower the impact of LTE-A transmissions on Wi-Fi users.
Pulkkinen et al. [249] analyze deep supervised learningbased interference detection using a real testbed. The following practical recommendations to be used in future ML-based interference detection schemes are given: (i) deep learningbased approaches require similar levels of noise in testing and training data sets or a large number of samples with different noise levels from different environments, (ii) training should include multi-label classification.
Distributed network monitoring is proposed by Yin et al. [252], where the unsupervised NN-based estimation of the number of coexisting Wi-Fi stations is implemented in NR-U devices. The learning process builds upon the collision probability detected in the unlicensed channel. This solution outperforms Kalman filter-based solutions. Furthermore, Yang et al. [247] use fuzzy Q-learning to either centrally (in a central unit in C-RAN) or distributively (in each eNB) learn the Wi-Fi performance to improve the scheduling decisions on the LTE-LAA side.
Some papers take advantage of a dedicated interface between Wi-Fi and LTE. Fakhfakh et al. [245], [246] have each LTE user obtain information from the 802.11k amendment on the load of the coexisting APs. Then, supported by Q-learning, LTE offloading decisions are made. This approach is interesting from the Wi-Fi perspective, since overloaded APs are not selected by this mechanism and therefore Wi-Fi network performance is not worsened by the offloading decisions.

C. Signal Classification
ML is also used for signal classification and recognition without the need for implementing a dedicated interface between technologies or knowing the per-technology operation patterns. Wu et al. [42] survey wireless modulation recognition and wireless technology recognition supported by DSL. We review Wi-Fi-related solutions below.
Yang et al. [257] use CNNs to classify LTE-U and Wi-Fi signals while Girmay et al. [260] have LTE eNBs use CNNs to classify Wi-Fi conditions (saturation, non-saturation) without the need of decoding Wi-Fi frames, based on inter-frame space histograms. Fonseca et al. [259] also use CNNs: to classify LTE and Wi-Fi signals using an SDR-based RAT classifier. The wellknown object detection you only look once (YOLO) model is used for transfer learning and to speed up the training process of the classifier. The only change required was the adaption of the last layer to appropriately classify LTE and Wi-Fi signals. The developed solution provides 96% accuracy of RAT recognition. Gu et al. [255] use 80,000 LTE-U/Wi-Fi signal samples to train a CNN and RNN to recognize LTE-U/Wi-Fi signals. Only the CNN-based approach provides satisfactory results. Additionally, Mosleh et al. [256] use a NN with linear regression to track key performance indicators (KPIs) and estimate the probability of LTE-LAA/Wi-Fi coexistence, without using knowledge of the MAC/PHY protocols and parameters of the two technologies. Furthermore, Sathya et al. [254] use ML to distinguish between the presence of one or two Wi-Fi APs interfering with an LTE-U BS, based on detected energy levels during the OFF periods of the DCM instead of decoding Wi-Fi frames.
Finally, WiPlus [253] uses ML (i.e., k-means clustering) on the Wi-Fi side to detect LTE-U interference by using the spectral scan capabilities of COTS Wi-Fi hardware. This approach allows Wi-Fi to quantify the effective available channel airtime of each Wi-Fi link (downlink/uplink) at runtime. Moreover, the obtained timing information about LTE-U's ON and OFF phases allows Wi-Fi to schedule its transmissions only during the OFF phase to avoid collisions with LTE-U.

D. Cooperative Coexistence
Inter-network coexistence can also take on a cooperative form. A prominent example are Wi-Fi-Li-Fi networks, where the light fidelity (Li-Fi) component is responsible for data transmission using light waves (the THz band). Visible light communication (VLC) has many advantages such as high bandwidth, license-free operation, and electromagnetic safety. However, it has a short range and is vulnerable to link outage caused by obstructions. Therefore, it is often paired with Wi-Fi in the form of a hybrid network.
Wu et al. [283] provide a recent survey of research on this topic. They mention one ML-based solution related to loadbalancing, where RL provides centralized AP selection to avoid servicing users by overloaded APs [261], [262]. In other works, Alenezi and Hamdi [265], [266] consider the optimization of a hybrid Wi-Fi-VLC network with centralized control and Q-learning to improve network throughput. Wu and O'Brien [263] use an NN to select Wi-Fi-Li-Fi APs to avoid frequent handovers. The handover decision is made based on channel quality, resource availability, and user mobility, e.g., Wi-Fi-only APs are preferred for mobile users while Wi-Fi-Li-Fi APs are selected for static users based on received signal strength and user satisfaction levels. Finally, Sanusi et al. [264], combine fuzzy logic with NN to support Wi-Fi-Li-Fi handovers.

E. Open Challenges
We have identified several open challenges in the area of network coexistence. The performance of ML-based mechanisms is mostly verified by simulations. Therefore, real testbed validation is considered an important open challenge since it would verify the ML-based operation with real radio signals. This validation will identify crucial factors which have not been implemented yet (or are impossible to be included) in the simulators and may have been overlooked by researchers. Additionally, only a few papers consider adjusting the behavior of both Wi-Fi and LTE nodes. In most cases, only the LTE operation was supported by ML while the Wi-Fi operation was left unchanged. With the opening of the new 6 GHz unlicensed band, which paves the way to redefine channel access rules defined for other unlicensed bands [115], [284], [285], we believe that changes in the operation of both technologies could be considered in the future. Furthermore, only several papers concentrate on the new features introduced by NR-U and none address the configuration possibilities introduced by the newest 802.11 amendments (like 802.11ax). We believe that, e.g., the coexistence of NR-U with 802.11 OFDMA/MU-MIMO channel access gives novel options to be considered by future ML-based mechanisms. Finally, following [44], [286], we strongly agree that high attention should be paid to the security of inter-technology operation, e.g., in the case of augmenting coexisting networks with federated learning.

VII. MULTI-HOP WI-FI NETWORKS
The primary design goal for IEEE 802.11 networks is to be a single-hop access network. However, it can also be used in a variety of multi-hop settings (e.g., ad hoc or vehicular) either using the mainline standard (802.11a/b/g/n/ac/ax) or a dedicated

Machine Learning
Possible observations: Link transmission rates, link error rates, expected transmission count, etc.

Actions:
Update routing table Figure 19. Example of using ML to improve routing in an ad hoc network: wireless nodes update their routing tables based on inference from observation. Each node operates a separate ML instance. amendment (such as 802.11ah for IoT and 802.11p for vehicular networks). Research papers dealing with multi-hop settings often either do not specify the underlying technology, assume a generic CR technology, assume heterogeneous networks (e.g., 802.11 and LTE), or use an alternative technology (which could theoretically be replaced by Wi-Fi). One of the reasons for this is that the key multi-hop problem, routing (Figure 19), is beyond the scope of 802. 11. Therefore, in the following, we provide only a general overview of how ML is applied in various multi-hop settings: ad hoc, mesh, sensor, vehicular, and relay networks. We point the reader towards relevant surveys and tutorials in each area and note that a detailed overview of using ML in multi-hop wireless settings could be the topic of a separate survey.

A. Ad Hoc Networks
The research popularity of (generic) ad hoc networks and MANETs reached its peak over a dozen years ago. They have mostly been replaced by their more application-oriented variants (mesh, sensor, vehicular, etc.) which we will discuss further on 7 . An overview of applying ML techniques to ad hoc networks is found in a 2007 paper by Forster [36]. The state of the art reported in this paper is outdated, but the list of applicable ML techniques (RL, swarm intelligence, mobile agents, etc.) and use cases (mainly improving routing efficiency) remains current. Al-Rawi et al. [288] provide an overview of applying RL to improve routing in distributed wireless networks. More Wi-Fi-related examples include applying Qlearning to the optimized link state routing (OLSR) routing protocol [289]- [291] and applying RL to 802.11-based delay tolerant networks (DTNs) [292] .
ML can also optimize ad hoc network configuration [293], but this example is for a cognitive radio ad hoc network (CRAHN) (i.e., without 802.11). Another active area of research for MANETs is mobility prediction [294], but again the MLbased solutions do not explicitly consider Wi-Fi [295], [296].
Similarly, research on applying Q-learning to interference cancellation in ad hoc networks also does not consider Wi-Fi [297].

B. Mesh Networks
WMNs consist of static wireless nodes which are usually deployed to distribute Internet access to clients in an area. Karunaratne and Gacanin [298] provide a recent tutorial on ML-based approaches for WMNs. Important problems which are solved with ML include routing, channel assignment, and network deployment. ML techniques (such as SVM, k-means clustering, and Q-learning) are mapped to the identified WMN problems. Future research directions include the potential of DL, which has been recently demonstrated in the context of network flow optimization [299].
Niyato and Hossain [300] show how Q-learning helps clients perform AP selection in an IEEE 802.11 mesh network. Decisions are based on the estimated collision probability and received signal strength. The learning approach outperforms a best signal strength heuristic, especially under non-uniform node distribution.
Another example is training an NN to predict link bandwidth in an 802.11 mesh network [301]. The inputs are the averages of important PHY and MAC metrics: SNR, transmission time, MCS, and re-transmission rate. The approach accurately predicts link bandwidth, which can then be used as a routing metric.
Link quality prediction is also the topic of a paper by Bote-Lorenzo et al. [302]. Based on an extensive dataset from an existing 802.11-based community WMN, four ML algorithms for regression (online perception, online regression trees with options, fast incremental model trees with drift detection, and adaptive model rules) are evaluated. Only the first of these outperforms a simple baseline and only under certain circumstances. This leads to the design of a hybrid algorithm, which supports the thesis that applying ML is not a straightforward approach.
For heterogeneous (Wi-Fi and LTE) mesh networks, the routing protocol is enhanced by Q-learning for RAT selection [303]. Each node performs observations as follows: LTE link quality is determined by network load (measured through buffer occupancy), Wi-Fi link quality -according to the current PHY transmission rate. Through appropriate RAT selection, nodes can observe up to 200% throughput increase compared to the single-technology case.
Finally, we comment on the dedicated 802.11s amendment for mesh networks. Among its features, it introduced MAClayer routing called path selection in the form of the hybrid wireless mesh protocol (HWMP). However, our literature review did not identify any papers directly related to applying ML for improving the performance of either HWMP or other 802.11s functionalities. Testi et al. [304] analyze network topology inference using external sensors for a simulated 802.11s network, but no specific mesh functionalities are considered. The lack of dedicated 802.11s research is most likely the result of the limited deployment of 802.11s by the industry.

C. Sensor Networks
The application of ML to sensor networks (i.e., the communication part of IoT) is an active research topic [32], [305]- [310]. Among the most important network performance research problems for sensor networks, which are solved with ML methods, are: sensor grouping (clustering, data aggregation), energy-efficient operation (scheduling, duty cycling), resource allocation (cell/channel selection, channel access), traffic classification, routing, mobility prediction, power allocation, interference management, and resource discovery [307]. However, Wi-Fi is only one of many IoT-enabling technologies and 802.11related solutions are rarely mentioned in these surveys. The only direct performance-related area mentioned in these surveys is classifying 802.11 interference using a deep convolutional neural network (DCNN) [311], [312], SVM [313], or various types of SL classifiers: classification trees (CTs) and SVM [314].
There are two 802.11 amendments related to IoT: 802.11af and 802.11ah. The former is a CR-based approach to use Wi-Fi in TV white space spectrum and has not enjoyed commercial success. Thus, there are also few research papers related to improving 802.11af performance with ML. A singular example is the work by Xu et al. [315], [316] on 802.11af rate adaptation schemes, which use DL models, although their work is in the context of vehicular networks.
Meanwhile, the 802.11ah amendment has had more commercial success (as HaLow) and received more attention from the research community. However, while 802.11ah permits treebased multi-hop communication [317], it is a predominantly single-hop technology. This observation is reflected in a recent survey on 802.11ah research [318] where, out of about 200 cited references, only three consider multi-hop scenarios. Moreover, surprisingly, only two papers by Tian et al. [319], [320] deal with applying ML: both use a form of SL to optimize the parameters of 802.11ah's grouping functionality, restricted access window (RAW). A similar problem is also addressed by Mahesh and Harigovindan [321], where an MLP NN configures these parameters considering, i.a., network size and MCS values used. Other applications of ML to 802.11ah include: improving coexistence with 802. 15.4g devices, a type of lowrate wireless personal area network (LR-WPAN), by avoiding interference with their transmissions using a Q-learning-based backoff mechanism [322], grouping sensors based on their traffic demands and channel conditions using a regressionbased model [323], grouping sensors based on their data rates by classifying them with NNs [324], and improving carrier frequency offset estimation using various types of DNNs [90].
Finally, research is also being done for generic Wi-Fi (i.e., the mainline amendments). Zhao et al. [325] propose a DQLbased method of optimizing CW for energy-constrained IoT networks. Chen et al. [326] also optimize CW but using a DNN for IoT networks using 802.11ax. Shin et al. [327] provide a method for RAT selection, between Wi-Fi and narrow-band IoT (NB-IoT), using RL to optimize for per-node latency. This has been further extended for mobile sensor networks incorporating UAVs. Kurunathan et al. [328] and Li et al. [329], [330] present a learning-based approach using DQN and DDPG for trajectory planning and integrated communication.

D. Vehicular Networks
There has been much research in the area of applying ML to vehicular networks, with Wi-Fi being only one of the many considered wireless access technologies. Some recent surveys and tutorials include [331]- [337]. They point to the application of ML in vehicular networks in the following areas of performance improvement: channel estimation, traffic flow prediction, location prediction-based scheduling and routing, network congestion control, load balancing and handovers, and resource management. Other non-performance areas where ML is applied include vehicle trajectory prediction (for ensuring road safety), network security, and in-car infotainment [338].
From the Wi-Fi perspective, 802.11p is the amendment dedicated to vehicular networks and is included in larger vehicle-to-everything (V2X) frameworks such as dedicated short-range communications (DSRC) and the ETSI ITS-G5 standard [339]. Noor-A-Rahim et al. [335] review ML-based resource allocation approaches in DSRC networks. Examples of using ML for improving 802.11p performance include: using DRL for per-link band and transmission power allocation [340], RL for tuning the CW size [341]- [343], Q-learning for improving handoff decisions [344], improving transmission control protocol (TCP) performance with federated learning [345], DNNs for channel estimation [346], and using RL for selecting the data transmission rate in a high-mobility scenario [334], [347].
An emerging future research direction is applying ML to 802.11bd, the successor to 802.11p scheduled for release in 2022 [348]. Beam alignment is one important problem of mmWave bands (cf. Section IV-A). However, contrary to WLAN scenarios, the knowledge of a vehicle's position supports beam sector selection [349], where learning to rank (LTR), also referred to as machine-learned ranking (MLR), can rank antenna pointing directions. The extention of the input information from just the location of the receiver to the location of surrounding vehicles, called situational awareness, can improve the performance of ML-based algorithms. Beam alignment is determined using classifiers [350], [351] or regression models [352], [353]. Throughput is satisfactory even if the best beam pair is not selected, providing an accuracyoverhead trade-off.

E. Relay Networks
The typical single-hop 802.11 deployment scenario is extended to a two-hop case with cooperative communications, where stations are allowed to relay the transmissions of others [354]. Such functionality requires appropriate coordination between the AP and stations, which is enhanced by a mechanism to support concurrent transmissions from different devices in a WLAN setting [355]. Since the AP may not have full information of the whole network, the problem is modelled as a partially observable Markov decision process (POMDP) and solved by an RL algorithm. that can find which senders can transmit simultaneously. Results show that low-rate links, usually corresponding to distant stations, significantly improve their throughput. Despite this singular example, WLAN-based relay networks have received limited interest from ML researchers. If relay networks become an important feature of future Wi-Fi networks, solutions can be borrowed from 5G networks such as ML-based relay selection [356].

F. Open Challenges
While a multitude of ML-related open research challenges can be listed for multi-hop networks in general, much less can be named if we restrict our focus to Wi-Fi-based ones, because Wi-Fi is predominantly used in single-hop deployments. Even the latest amendments dedicated to sensor (802.11ah) and vehicular (802.11bd) networks mainly operate over single hops.
One area where Wi-Fi is used for wireless multi-hop transmissions is providing FWA over mmWave links (cf. Section IV-A). FWA is an important use case for 802.11ay, where coverage is extended with a mesh-like distribution network [118]. Research is required in developing new (or adopting existing) ML-based solutions to this particular scenario in the areas of resource allocation and resource coordination. An example solution is provided by Lahsen-Cherif et al. [357]: a QL-based routing protocol optimizes energy and throughput in a backhaul WMN scenario with directional links but Wi-Fi is not explicitly stated as the wireless technology.
Another area with open challenges is relay selection for vehicular networks. Zugno et al. [358] suggest a cross-layer approach combining routing with the 802.11 stack. ML could assess per-link routing cost more accurately. Alternatively, auxiliary sources of information could support vehicular relay selection. A first example comes from Morocho-Cayamcela et a. [359], where an ML algorithm selects relays based on satellite imagery. Such imagery and other types of auxiliary information, combined with the power of ML, can potentially improve vehicular network performance.

VIII. AVAILABLE TOOLS AND DATASETS
The review of research papers in the previous sections confirms that ML-based control solutions often overtake traditionally designed ones in terms of performance and efficiency. However, to reach such high performance levels, long training is required. For example, an RL agent needs many interactions with an environment to learn the best policies, while in SL, the tuning of an ML model requires access to large labelled datasets. In this section, we describe the available research tools, datasets, and testbeds that were used in the reviewed papers and are available for other researchers in the field.

A. Tool Chains
From our keyword analysis of more than 250 papers combining ML with Wi-Fi, regarding the evaluation methodology, we found that most researchers run network simulations (≈ 80%) to validate their solutions. Only around a quarter of them perform analytical investigations or experiments in real testbeds. The lack of real-life experiments is understandable as they are often complex, risky, and expensive to execute. For simulation analysis, the ns-3 network simulator 8 , known from non-ML networking research, is the most popular with a share of 10%.
Meanwhile, experimental studies were mostly based on SDR platforms like Ettus USRPs 9 whereas COTS Wi-Fi hardware, mostly with Atheros and Intel chipsets, was rarely used. The most commonly used ML libraries were Tensorflow (10%) and Keras (5%). Based on the results of our analysis, it becomes evident that the seamless support of network simulators (like ns-3) and SDR platforms for research of ML-based solutions for Wi-Fi is of great importance. We have observed the first research frameworks which aim to simplify the integration of ML and Wi-Fi. The general role of network simulators for bridging the gap between ML and communications systems like Wi-Fi is discussed by Wilhelmi et al. [360], where possible workflows for ML in networking and the use of existing tools is presented. Among these is ns3-gym, a software framework enabling the design of RL-driven solutions for communication networks, proposed by Gawlowicz et al. [96]. This framework is based on the OpenAI Gym toolkit 10 and provides an extension to the ns-3 network simulator ( Figure 20). With ns3-gym, it is possible to use any simulated communication network (e.g., mixed Wi-Fi and LTE) as a Gym environment so that RL agents can control the behavior of network protocols. OpenAI Gym has also been integrated with Veins [361], a popular open source vehicular networking simulator based on OMNeT++. The resulting VeinsGym [362] supports ML both at the protocol as well as at the application level. Yin et al. [363] provide ns3-ai, which offers the same functionality as ns3-gym but better performance by using shared memory for inter-process  Figure 21. Architecture of the GrGym framework [364], which provides an interface to integrate GNU Radio and OpenAI Gym. communication when running both the simulation and the Gym agent locally. GrGym [364] is a similar framework, but it builds on the GNU Radio 11 signal processing platform. Any GNU Radio program can be integrated as an environment in the Gym framework ( Figure 21) by exposing its state and control parameters for the agent's learning purposes. In contrast to ns3-gym, GrGym allows the Wi-Fi network to be a real testbed consisting of SDR nodes performing real transmissions over the air. This enables studying the performance of an ML-based solution under real channel and interference conditions. The downside is the higher effort required to setup a network as well as the lack of reproducibility. Finally, Komondor 12 is another network simulator which supports a subset of the 802.11ax standard. This tool is designed for simulating complex environments in next-generation Wi-Fi networks with direct ML support. Barrachina-Munoz et al. [183] identify several use cases and present ML-based solutions using Komondor.

B. Datasets
The existence of open-source and standardized datasets is essential for training and comparing ML-based algorithms. Moreover, such datasets accelerate development and foster reproducible research. For example, the recent advances in image classification and recognition were enabled by the emergence of large labelled image datasets (e.g., ImageNet [365]) . We have found that researchers usually rely on their own datasets. Specifically, in 49 papers, they created labelled datasets by running experiments in testbeds and/or simulators, while only in 6 articles they used publicly available datasets. Moreover, while being a good practice, releasing the created dataset along with the published paper is still not the case for most of the publications (i.e., only 6 datasets were released). Here, we describe datasets available online that the community can immediately use for further ML-based Wi-Fi performance optimization. CRAWDAD 13 is a repository with a vast set of Wi-Fi measurements. The datasets include traces from smartphones performing Wi-Fi scans, multipath TCP traces collected from a Wi-Fi campus network, as well as traces collected for other wireless technologies like Bluetooth and ZigBee. Challita et al. [223] used a subset of the CRAWDAD dataset which included records (e.g., information about the amount of transfered data, error rates, signal strength) collected by polling Wi-Fi APs every 5 minutes in a corporate research center over several weeks. Similarly, a dataset called sigcomm2008 contains traces of wireless network measurements collected during the SIGCOMM 2008 conference.
IEEE DataPort 14 is another large repository of datasets created to encourage reproducible research. Within this repository, Karmakar et al. [68] provide the IEEE 802.11ac performance dataset 15 that contains information regarding normalized throughput achieved under five link configuration parameters (i.e., channel bandwidth, MCS, guard interval, MIMO, and frame aggregation) and the channel quality measured as SNR.
Kaggle 16 is an online platform for data scientists and machine learning practitioners. The platform allows users to find and publish datasets. Moreover, it is frequently used by companies to organize competitions to solve data science challenges. At the time of writing, the Kaggle platform offers only a limited number of Wi-Fi-related datasets, e.g., the Wi-Fi Study 17 dataset contains a study of the quality of the Wi-Fi and user perceptions of Wi-Fi conducted by students in a dormitory.
Next, we briefly describe the datasets from the reviewed papers that are available from researchers on their individual webpages. Herzen et al. [83] provide a dataset to predict throughput based on basic performance metrics (e.g., received power, channel width) collected in a small testbed 18 . Cell vs. Wi-Fi 19 is a publicly available dataset based on an Android application that collects packet-level traces of TCP downlink and uplink traffic between a mobile device and a server for both Wi-Fi and cellular networks. The dataset is used to find hidden dependencies in low-level Wi-Fi performance data [366]. Polese et al. [121] provide an experimental waveform dataset 20 generated using the NI mmWave transceiver system with 60 GHz radio heads, as well as the source code using Keras API for training and testing ML models 21 . Similar measurement data for indoor mmWave using 802.11ad from the papers by Aggarwal et al. [132], [133] is also available 22 . Rice University's LiveLab dataset 23 contains long-term measurements from real-world smartphones about their usage (e.g., CPU time) as well as data collected over a Wi-Fi interface (e.g., periodic readings of available Wi-Fi access points). The dataset is used by Chakraborty et al. [367] for admission control in wireless networks supported by light-weight machine learning.
The available datasets provide mostly raw measurements (e.g., RSSI, CSI) or traces of sniffed Wi-Fi traffic which are used to find anomalies with ML techniques. For example, Fulara et al. [368] detect the causes of unnecessary active scanning performed by Wi-Fi stations. Moreover, there exist datasets meant for Wi-Fi-based applications (e.g., human detection, activity recognition, people tracing, traffic classification) which rely on ML. We believe that such datasets can also be used to improve the performance of Wi-Fi networks. For example, if an AP knows that a traffic flow is a long-lived flow (e.g., a video transmission), it might perform long-term optimizations to improve the flow quality that would not make sense for a shortlived flow. Moreover, the location tracking of Wi-Fi stations can help a Wi-Fi network prepare for a handover operation in advance, which would result in faster handover execution and a smaller number of outage events. Example datasets containing location information and Wi-Fi signal strength are available on the Kaggle platform 24 .
Finally, we believe that significant efforts have to be taken to create large and high-quality datasets and encourage sharing them among the wireless research community. To this end, it would be beneficial to create standardized procedures for data collection to allow researchers to cooperatively build new and extend existing datasets. The potential use of different wireless platforms/testbeds for measurements might positively impact learning performance (e.g., avoid model overfitting). Due to diverse hardware characteristics (such as TX power), however, the created datasets have to be precisely described (i.e., provided with complete metadata) to avoid misunderstanding and unnecessary debugging of the ML models.

C. Testbeds
To support the experimental evaluation of ML-based Wi-Fi solutions, open-access wireless testbeds are helpful [369]. Examples of such testbeds which support 802.11-based networking include Orbit [370], COSMOS [371] and POWDER [372]. These testbeds provide not only hardware, but also software support (crucial for deploying ML). For example, in the Orbit testbed, the former requirement is addressed by having integrated graphics processing units (GPUs) in the wireless nodes which speed up the learning process. Moreover, preconfigured Linux images with ML tools like Keras and Tensorflow are preinstalled to accelerate the implementation of novel MLbased solutions for WiFi. The case is similar for the other testbeds -they support ML-related studies (even if it is not their main goal) and such research has been performed with these testbeds [373], [374].

IX. FUTURE RESEARCH DIRECTIONS
Through all previous sections, we have overviewed, discussed, and systematically classified many research works aiming to improve Wi-Fi through machine learning. All these works have a similar motivation: the use of ML to find 24 https://www.kaggle.com/c/indoor-location-navigation/ what are the best decisions that a Wi-Fi network, or its different functionalities, can make to offer better performance in changing and heterogeneous scenarios. Although we covered over 250 papers, they represent only the first step of a long path towards fully adopting ML in future Wi-Fi and wireless networks in general. In the following, we describe several general open challenges and suggest potential future research directions.

A. Dealing with New and Flexible but Complex Wi-Fi Features
In recent years, the catalog of available Wi-Fi functionalities has been rapidly expanding to include more complex features to cope with current and future user needs. For example, IEEE 802.11be will incorporate multi-link operation and, possibly, multi-AP coordination in addition to already existing features such as OFDMA, downlink and uplink MU-MIMO, spatial reuse, and channel aggregation. A common aspect of most of these functionalities is that they offer a high degree of flexibility to schedule traffic in time, space, and frequency, which, if properly used, may enable high-performance gains.
To achieve this goal, ML techniques may play an important role, enabling self-adaptation to different situations and scenarios, as well as improving decision making by leveraging past information to predict next actions. For example, multi-band Wi-Fi devices can use ML methods to predict link quality and select links accordingly [5].

B. Joint Optimization of Wi-Fi Features
Most of the discussed papers focus on the optimization of a single Wi-Fi feature like the CW of Wi-Fi's channel access function. However, it becomes clear that separate Wi-Fi features cannot be optimized in isolation. Instead, they must be jointly optimized with others to achieve the best possible performance. As an example, consider the tuning of transmit power and carrier sensing threshold to enhance spatial reuse [8]. Hence, the research on ML schemes suitable for joint optimization of multiple Wi-Fi features is a promising future research direction. Especially developing ML solutions with a fast learning speed is of great importance due to the high complexity involved. For example, hierarchical learning principles allow improving learning speed by decomposing complex joint optimization problems into multiple sub-problems [375].

C. ML-enhanced Wi-Fi Features by Design
Most of the discussed works build ML functionalities on top of current Wi-Fi features, by tuning their parameters. An open challenge and a disruptive future approach would be to redesign these functionalities by explicitly embedding ML capabilities in them. Heuristic algorithms or hard-coded rules could be replaced by ML agents able to self-configure based on gathered experience [94], [376]. For example, providing guaranteed QoS or spatial reuse are challenges which could benefit from being designed with built-in ML capabilities [5].

D. ML-based Architectures and Standardized Interfaces
Another open challenge to solve is where to perform and execute certain ML-related actions, which in the case of Wi-Fi networks may include the device, the AP, a controller in the network edge, and a controller in the cloud. In any case, the answer to this question requires knowing aspects such as the tolerable latency required to obtain the output of an ML process, the required information to perform it, and the computational resources. The design and orchestration of distributed ML solutions that adapt to the pros and cons of each case is still an open challenge, requiring the definition of new interfaces as well as how and when to exchange data and ML models between components.
A pioneering work dealing with these aspects for WLANs is by Wilhelmi et al. [377], where the International Telecommunications Union (ITU) unified architecture for 5G and beyond is extended to support ML techniques at multiple levels, from the end device to the cloud. This work is then complemented with a 'sandbox' element of the ITU-T architecture ( Figure 22) to execute off-line training [360]. Validation of ML techniques and models is further analyzed and discussed. Another framework to consider is the European Telecommunications Standards Institute (ETSI) generic autonomic network architecture (GANA) [378], which defines decision-making entities (and their associated control-loops) where ML can be applied.

E. Reference Evaluation Scenarios and Performance Metrics
Almost all published papers considering ML techniques conclude they can significantly improve the system performance. While we do not question these results, we point out the lack of a set of common scenarios. This situation prevents the direct comparison of the results between different papers, and therefore, makes it challenging to extract solid conclusions and track the progress in the area of using ML for enhancing Wi-Fi. Designing these scenarios in a way so they are useful to test ML solutions is challenging. Specifically, the evaluation scenarios should cover a wide range of difficulty levels. For example, in the beginning training phases, small stationary scenarios are helpful to illustrate and debug how ML solutions work. However, later on, the environment dynamics should be also considered, as they must be complex enough to include non-straightforward situations. Specifically, successful MLbased proposals should be tested in large, heterogeneous, and dynamic scenarios to show that they properly adapt and scale to different conditions.
Additionally, a set of common scenarios will foster another open challenge: reproducible research. This aspect is important due to the amount of information required to reproduce exactly, step by step, the same environmental conditions and ML process responses in different places and by different actors. The use of detailed and accurate datasets may contribute to making this possible.

F. ML-enhanced Network Simulation Tools
The development and maintenance of reference scenarios is much easier with a set of simulation frameworks, standardized and commonly accepted by the research community. However, there is still a lack of tools, which would seamlessly integrate ML solutions. Although there have been some attempts to solve this situation (e.g., the OpenAI module for ns-3 [96] and Komondor [183]), we are still far from a point where general networking simulators will allow including ML routines by default. Achieving a solution will be challenging, as we need to (i) define standard interfaces between Wi-Fi components and ML functions and (ii) incorporate the execution times required by ML instances as part of the virtual simulation time.

G. Testbeds and Real Pilots
The previous discussion regarding the need for scenarios and suitable simulators can be directly extended to the need for testing the correct operation of ML-enhanced functionalities in real networks, not only to validate their correct operation, but also to run experiments in conditions that simulators may not be able to reproduce accurately. Therefore, the development of platforms and testbeds that support the experimental research of Wi-Fi-enhanced ML networks is a crucial aspect before deploying these solutions in real networks. An important aspect to consider, and which should be included in the design of ML-aware solutions, is that they will have to coexist with non-ML-enabled solutions, and so potentially negative interactions should be considered in advance. An example of this is the Orbit testbed which provides access to nodes with hardware (GPU) and software (Keras/TensorFlow) support for ML-based research (cf. Section VIII-C).

H. Risks of ML Uncertainty
Following the previous points, it is important to explicitly tackle situations in which ML techniques cause unpredictable performance and may compromise the correct operation of a certain feature or even the whole Wi-Fi network. An open challenge is to design robust ML solutions that may sacrifice performance in general to prevent unexpected behaviors in particular scenarios.
ML-based models are highly successful and provide superb performance in many complex tasks. So far, however, models are applied in a black-box manner, i.e., no information is provided about what exactly makes them arrive at their decisions. This lack of transparency can be a major drawback and might remain a limiting factor for the broad adoption of ML-based algorithms in the area of wireless network control. Specifically, giving up human control to an intelligent blackbox brings the risk of improper behavior or unsafe decisions that might be dangerous for the operation of wireless networks, which in many cases may be considered critical infrastructure. In recent years, research on explaining and interpreting deep learning models attracted increasing attention: the work of Samek and Müller [379] targets validation of agent behavior and establishing guarantees that they will continue to perform as expected when deployed in a real-world environment. Furthermore, by explaining the internal structures, researchers hope to learn from ML-based agents capable of learning patterns that are not tractable by humans. To conclude, the explainability of ML agents will be of significant importance for the verification and certification (i.e., checking compliance with regulations) of ML-based wireless network control systems.

I. New ML Models and Distributed Learning
Another open challenge is the need to consider recent advances in ML techniques, which will certainly go together with the definition of new ML-based architectures and Wi-Fi features. For instance, due to its recent introduction, there are still few papers considering federated learning (FL) models for Wi-Fi (Figure 23b). FL is a distributed machine learning paradigm where a set of nodes cooperatively train an ML learning model with the help of a centralized server and without the need to share their local data [40], [380]. Specifically, nodes train their local model based on local (on-device) data, and then send the model parameters to the server, which in turn merges parameters from different nodes and sends the combined (global) parameters back to the distributed nodes. We expect that FL is of paramount importance for the optimization of Wi-Fi networks, as it trains models with individual data (e.g., available at stations or the AP) while also preserving user privacy. However, if FL will be implemented over wireless links, the mitigation of the adverse impact of wireless communications on FL performance metrics becomes unavoidable [381].
Transfer learning (TL) is another concept that might be helpful for wireless networks in general. In this ML method a model trained on one task is re-purposed on a second, related task. Usually, some retraining is required to finetune the model towards the second task. However, TL can save time or obtain better performance in comparison to the development of a model from scratch [382]. This technique works only if the model features learned from the first task are general. In the context of wireless networks, TL might be applicable when reusing models trained in networks of a different technology (e.g., interference recognition in LTE) to boost the performance of Wi-Fi networks. A recent survey on TL for wireless networks provides more insights regarding this important research direction [39].

J. Learning from Experience
Developing ML-based solutions is not a straightforward process. For example, it involves trial and error in terms of configuring satisfactory model parameters. Just as ML models are based on learning from previous experience, researchers deploying ML solutions would benefit from sharing the experience gained from the development process. Our literature review has revealed only a few explicit descriptions of such lessons learned in published papers. We share them here: • Nurani Krishnan et al. [383] advocate a single hidden layer in agents due to their lower complexity and faster training than with multiple hidden layers. To limit data acquisition costs, DRL agents should be trained online and provided with simple state information. Also, the duration of data collection is an important parameter to be optimized depending on the use case. • Zhang et al. [14], who consider DL for cellular networks, point out that "deep learning solutions are not universal" and thus not suitable in every case. They are prone to mistakes, misinterpretation, and do not explain causality (especially in prediction models). Furthermore, the model complexity-accuracy trade-off is important for agents deployed on mobile devices and that RL agents require training in real environments or high-fidelity simulators. • Wilhelmi et al. [163] warn that "out-of-the-box DL methods may fail at capturing the relationship between interference and performance of WLANs" and that datadriven solutions should be merged with the models used. Additionally, for prediction applications, the preprocessing of a dataset is crucial to obtain generalized solutions. • Girmay et al. [239] point out that QL has been overused (at least in the area of network coexistence) since "Qlearning is not an efficient solution for problems with dynamic environments". Experience replay is proposed as an alternative. • Further recommendations regarding training ML models include the following: the achieved model accuracy depends on the selected training features, using specific training data will lead to results which do not generalize, and, for limited training datasets, the subset selection becomes crucial [164], [249], [384]. Given the relevance of experience to avoid repeating mistakes, we encourage researchers to always include in their works an explicit 'lessons learned' section detailing new insights, or corroborating existing ones, to contribute to the development of this research area. Currently, many papers simply apply ML methods without clearly explaining why they are relevant for the considered problem [28]. Researchers also need to discuss the challenges faced when applying ML methods. Otherwise, the researcher contribution is unnecessarily limited.

X. CONCLUSION
ML is playing an increasing role in the field of improving Wi-Fi performance. This survey has presented a comprehensive overview of over 250 recent ML-based solutions for a variety of performance areas. We started with basic Wi-Fi features (such as channel access and rate adaptation), then we moved to more complex aspects (such as channel bonding, multiband operation, and network management) and the problem of coexistence with other network technologies in shared bands. Next, we gave a brief overview of the application of ML to multi-hop Wi-Fi settings. Finally, we summarized the tools and data sets available for researchers in this field. To the best of our knowledge, this is the first survey to focus solely on Wi-Fi networks and to provide a detailed analysis of different Wi-Fi aspects that are supported through ML.
A comparison of the three main ML areas reveals that supervised learning and reinforcement learning are frequently used, while unsupervised learning is less popular (Figure 23). Meanwhile, the most often used ML mechanisms are Qlearning, multi-armed bandit, as well as different neural network types (mostly ANN, DNN, and CNN). In most cases, these mechanisms are implemented to optimize only a constrained set of 802.11 parameters. Additionally, from reviewing the comparative Tables II to V, we observe that with the increase in available computing power, DL methods are gaining in popularity. About half of the most recent papers implement DL (cf. Figure 23a vs. Figure 23b). The most commonly used DL techniques are supervised learning and reinforcement learning, (Figure 23c). Additionally, federated learning and transfer learning are a recent introduction in the Wi-Fi domain. We expect that they will become more popular in the near future because they pose a chance to distribute learning tasks and improve training speed.
We believe that, as a next step, researchers will identify ML schemes for the joint optimization of a wider range of Wi-Fi features. Additionally, they should investigate the coexistence of ML-controlled and legacy networks, since it poses a possible source of unfairness in channel access. We also expect that the attractiveness of this area of research will continue to grow. To support this statement, we have identified several open research directions which could serve as a guide for researchers in their future work.