Improving Medium Access Efficiency With Intelligent Spectrum Learning

Through machine learning, this paper changes the fundamental assumption of the traditional medium access control (MAC) layer design. It obtains the capability of retrieving the information even the packets collide by training a deep neural network offline with the historical radio frequency (RF) traces and inferring the STAs involved collisions online in near-real-time. Specifically, we propose a MAC protocol based on intelligent spectrum learning for the future wireless local area networks (WLANs), called SL-MAC. In the proposed MAC, an access point (AP) is installed with a pre-trained convolutional neural network (CNN) model to identify the stations (STAs) involved in the collisions. In contrast to the conventional contention-based random medium access methods, e.g., IEEE 802.11 distributed coordination function (DCF), the proposed SL-MAC protocol seeks to schedule data transmissions from the STAs suffering from the collisions. To achieve this goal, we develop a two-step offline training algorithm enabling the AP to sense the spectrum with the aid of the CNN. In particular, on receiving the overlapped signal(s), the AP firstly predicts the number of STAs involving collisions and then further identifies the STAs’ ID. Furthermore, we analyze the upper bound of throughput gain brought by the CNN predictor and investigate the impact of the inference error on the achieved throughput. Extensive simulations show the superiority of the proposed SL-MAC and allow us to gain insights on the trade-off between performance gain and the inference accuracy.


I. INTRODUCTION
In the past few years, IEEE 802.11 based wireless local area networks (WLANs), commonly known as WiFi networks, have experienced a noticeable growth and keep pace with an ever-increasing number of mobile devices with low cost [1]. In such a case, how the mobile devices proficiently achieve channel coordination to improve the spectrum efficiency in the dense deployment WLANs scenarios (e.g., ultradense 5G networks [2]) are attracting significant devotion from both industry and academia [3]. Conventionally, the The associate editor coordinating the review of this manuscript and approving it for publication was Kim-Kwang Raymond Choo . IEEE 802.11 distributed coordination function (DCF) uses a binary exponential backoff (BEB) scheme as a carrier sense multiple access with collision avoidance (CSMA/CA) mechanism to decrease collisions. However, this scheme severely degrades the network performance when there exists a large number of devices contending the channel [4].
Recently, the field of machine learning (ML), especially deep learning (DL), has emerged as a promising technique to enable intelligent spectrum sensing and management capabilities in the future wireless communications [5]. By training deep neural networks, the features of signals can be extracted, which goes beyond simple waveform classification, e.g., amplitude and phase. Furthermore, this newly achieved popularity of DL for wireless communications is because of its potential driving the machine intelligence capability into some relevant applications of wireless communications. This in turn poses many challenges for the DL for WLAN's physical (PHY) layer [6] (e.g., wireless channel modeling [7], [8], modulation scheme recognition [9]) as well as MAC layer design [10]. The spectrum at the unlicensed ISM frequency bands bears more and more traffic from the heterogeneous networks. Since some interests in wireless networks and machine learning has been quite visible in recent years, how to integrate deep learning with spectrum sensing to achieve the intelligent medium access on the unlicensed ISM band is still ongoing research. In this context, artificial intelligence (AI) functioning on the MAC will be a general trend, and how to improve channel access efficiency with the aid of DL achieving backward compatibility with conventional IEEE 802.11 DCF is the focus of this work. The above reasons motivate this work, and thus it is desirable to design a MAC protocol powered by DL to improve the network capacity for WLANs.
This paper aims to improve channel access efficiency of IEEE 802.11 DCF on the unlicensed ISM band. Instead of the traditional spectrum sensing methods that can only know whether the channel is occupied or not, we hope to find a way to explore more about the channel usage information, e.g., how many devices are sharing the spectrum and who are they. In this case, an additional deeper level of information is required to make the MAC protocol design more intelligently and efficiently. Therefore, by integrating MAC design with deep learning, better channel coordination and spectrum sharing can be achieved based on the inference results from the convolutional neural network (CNN) model. Compared with the conventional CSMA/CA based 802.11 DCF scheme, the unique advantage of the proposed solution is ''Intelligent'' yet ''Undemanding''. Different from the conventional IEEE 802.11 DCF, the proposed SL-MAC protocol becomes more intelligent with the aid of deep learning and more efficient without requiring additional information. To the best of the authors' knowledge, this is the first attempt to integrate MAC layer design with deep neural networks predictor in a conventional IEEE 802.11 DCF setup.
The main contributions of this paper are summarized as follows.
• Spectrum learning-based MAC framework. We introduce a comprehensive MAC framework integrating with SL to improve MAC efficiency and bandwidth utilization for the future WLANs. We formulate the users' identification as a multi-class classification problem, which is resolved with the pre-trained CNN models.
• Design and implementation of the CNN structure.
We first present the design of the CNN structure, including the master-CNN model and slave-CNN models. Then we detail the implementation of the CNN models, including data collection, two-step offline training and online inference. Based on the implementation of the pre-trained CNN models, the Access Point (AP) can identify the users from the overlapped signals and dynamically schedule the conflicted users' data transmissions.
• Performance analysis of the proposed MAC framework. The upper bound of the throughput gain of the proposed SL-MAC protocol is analyzed, and the impact of the inference error on the achieved throughput is also investigated. Extensive simulations demonstrate the superiority of the proposed SL-MAC protocol. The remainder of this paper is organized as follows. In Section II, we introduce related works. We present the proposed SL-MAC protocol in detail in Section III. In Section IV, the upper bound of the throughput gain brought by deep learning is first analyzed and then the impact of the inference error on the achieved throughput is investigated. The CNN framework including data collection, offline training, and online inference, is presented in Section V. In Section VI, simulations are conducted to demonstrate the superiority of the proposed SL-MAC protocol. Conclusions and future work are finally discussed in Section VII.

II. RELATED WORK
As aforementioned, ML techniques are playing a more and more critical part in the MAC layer design for WLANs. In general, ML can be roughly categorized as supervised learning and unsupervised learning. In this section, we elaborate on the role of the two categories of learning techniques in the MAC scheme design in wireless communication networks.

A. SUPERVISED LEARNING-BASED MAC DESIGN
In supervised learning, the learning agent learns from a labeled training dataset supervised. The objective is to find the mapping from the input feature space to the label so that reliable prediction can be achieved for the new input data. Due to the features of the supervised ML, the supervised learning-based MAC design is not suitable for the scenario where the device learns the environment without the help of a supervisor and the labeled training dataset. Specifically, a deep CNN model was used to perform the classification directly from spectrograms to identify PU behavioral patterns in a cognitive radio context [11]. Ruan et al. [12] proposed an ML-based predictive dynamic bandwidth allocation algorithm to address the uplink bandwidth contention and latency bottleneck of such networks. Rajendran et al. [13] achieved the automatic modulation classification based on the LSTM model, which learns from the time domain amplitude and phase information of the modulation schemes present in the training data for a distributed wireless spectrum sensing network. In [14], Liu et al. used a deep neural network (DNN) to explore the data-driven test statistic intelligently and proposed a covariance matrix-aware CNN-based spectrum sensing algorithm to improve the detection performance further. In [15], the machine learning algorithms were demonstrated that appreciably outperform classical signal detection methods in the 3.5-GHz band. Furthermore, Gao et al. [16] proposed a deep learning-based signal detector that exploits the underlying structural information of the modulated signals. Peng et al. [17] explored the transfer learning to address robustness in DL-based spectrum sensing to improve the robustness.
In particular, to achieve intelligent spectrum sensing, our previous works [18], [19], proposed a distributed MAC framework assisted by the deep learning, where a DNN model was trained offline to help to coordinate channel access by exploring the features from the overlapped signals. Kim et al. [20] proposed a deep learning-aided sparse code multiple access scheme. They used the autoencoder structure of a DNN to learn the codebook and decoding strategy for SCMA to minimize the bit error rate (BER). In [21], the medium access control protocol identification was investigated for applications in cognitive radio networks, where the second (CR) users detect the MAC schemes used by the primary users with the help of SVM. Thereby the CR users can be aware of the time and frequency of the spectrum hole.

B. UNSUPERVISED LEARNING-BASED MAC DESIGN
In unsupervised learning, the learner is provided only with unlabeled data, while learning is performed by finding an efficient representation of the data samples without any labeling information. As such, the unsupervised learning-based MAC design is suitable for the practical wireless network scenarios where no prior knowledge about the outcomes exists. Specifically, recent years have witnessed the wide study on deep reinforcement learning (DRL) in the field of dynamic spectrum access problems in wireless networks [22]- [26]. In particular, Nguyen et al. [22] proposed to use the deep Q-learning method to learn a state-action value function that determines an access policy from the observed states of all channels. In [23], the authors applied the DRL for SUs to learn appropriate spectrum access strategies in a distributed fashion, assuming no knowledge of the underlying system statistics. In [24], a multi-agent deep reinforcement learning method was adopted by the secondary users to learn sensing strategy from the sensing results of some of the selected spectra to avoid interference to the primary users and to coordinate with other secondary users in cognitive radio networks. Furthermore, Yu et al. [25] investigated a DRL-based MAC protocol to learn an optimal channel access strategy to achieve a certain pre-specified global objective for heterogeneous wireless networking. Cao et al. [26] proposed a DRL-based MAC protocol to assist the backscatter communications for Internet-of-Things (IoT) networks, where the DRL was introduced to learn the reserved information and make decisions accordingly.
Nevertheless, most of the ML-based MAC works listed in the literature above have introduced significant overhead for the existing wireless communication systems. They are not applicable for the conventional IEEE 802.11 WLANs networks, where the stations (STAs) are usually with low complexity, and it is difficult for them to ''learn and predict'' the dynamic changes in the networks. Our proposal MAC framework falls within the first category of ML approaches (i.e., supervised learning-based MAC), aiming to enable AP to figure out the STAs involved the collisions during the channel contention and then schedule them to re-transmit without collisions.

III. THE PROPOSED MAC FRAMEWORK
A. PRELIMINARY AND BASIC IDEA This paper considers a typical WLAN scenario where there exists a total of N STAs which are associated with the AP and trying to transmit data packets to the AP. In the context of the typical application scenarios of IEEE 802.11 WLANs (e.g., the office building where the associated WiFi users belong to the company employees in general), it is reasonably assumed that the user base keeps stable within a future period in this paper. All the STAs contend to access the channel following the BEB scheme introduced by the conventional IEEE 802.11 DCF. Generally, the conventional IEEE 802.11 DCF scheme (e.g., four-way handshake) is used in WLANs to coordinate channel access among users. Once more than one user transmits the request-to-send (RTS) packets at the same time, the collisions occur and these users need to increase the backoff counter and contend to reaccess the channel. According to the analysis in [29], the collision probability is proportional to the number of STAs. In this case, the channel is usually with low utilization due to the severe collisions, especially in the dense deployment scenarios. In this paper, through machine learning, we change the fundamental assumption of the traditional MAC layer design. We obtain the capability of retrieving the information even the packets collide by training a CNN model offline with the historical RF traces and inferring the STAs involved collisions online in near-real-time.
Without loss of generality, we assume that each STA contends the channel using the traditional four-way handshake (i.e., RTS-CTS-DATA-ACK), and some of the STAs may choose the same backoff counter during their backoff procedures. As a result, the collisions usually occur at the AP side due to receiving multiple RTS packets at the same time, e.g., Alice and Bob in Fig. 1(a). In this paper, a novel MAC protocol, called spectrum learning-powered MAC (SL-MAC), is proposed for future dense WLANs, as highlighted by Fig. 1(b). In the proposed SL-MAC, suppose that a pre-trained CNN model is deployed at the AP in advance, which enables the AP to identify the STAs from the overlapped RTS signal(s). On receiving the overlapped signals, the AP can detect the number of users involved collisions and identify who they are with the aid of the pre-trained CNN model. This process can be considered as a multi-class classification problem. According to the inference results, the AP replies a CTS packet, which includes the scheduling information of the users' data transmissions. The other users set their network allocation vector (NAV) accordingly and keep silent within this period. After the NAV expires, all the users can contend to reaccess the channel following the conventional IEEE 802.11 DCF.

B. PROPOSED MAC PROTOCOL
In the proposed SL-MAC protocol, the pre-trained CNN model files are implemented at the AP. On receiving the RTS signal(s), the AP not only can detect the number of STAs (denoted as n, n ∈ N , N is the set of all the STAs) but also identify who are they. An example of the proposed protocol with three STAs contending to access the channel is illustrated in Fig. 2. The proposed SL-MAC protocol includes three operation steps, which are channel contention step, collision detection and identification step, and scheduling transmission step, as detailed in the following three subsections.

1) CHANNEL CONTENTION
In this step, all the STAs with data traffic accumulated in the MAC queue first contend to access the channel based on the BEB scheme. According to the conventional IEEE 802.11 DCF scheme, denote the minimal and the maximum contention window (CW) size as CW min and CW max , respectively. In the beginning, all the STAs (e.g., the STA A , STA B and STA C in Fig. 2) first randomly chooses their backoff counters' value as B i ∈ [0, CW min ], i ∈ N , then the STAs start to perform backoff via the BEB scheme. Note that the backoff counter is hanged-up once the channel becomes busy. When the backoff counter becomes zero, i.e., B i = 0, the STAs transmit the RTS packets to AP. For example, STA A and STA B finish their backoff at the same time to transmit the RTS packets simultaneously. In this case, the collision occurs at the AP. Otherwise, the RTS packet can be received successfully by the AP.

2) COLLISION DETECTION AND IDENTIFICATION
In this step, on receiving the RTS packet(s), the AP replies the CTS packet based on the inference results given by the pre-trained CNN model. Suppose that the pre-trained CNN model files have already been implemented at the AP. On receiving the RTS signal(s), the AP achieves the inference results by performing the feedforward calculation. Denote the time cost achieving the inference as θ, then the AP replies the CTS packet after the time t SIFS + θ, t SIFS denotes the duration of the short inter-frame space (SIFS), as illustrated in Fig. 2. To realize collision avoidance and maintain compatibility with legacy IEEE 802.11, a ''Scheduling Info'' field including the conflicting STAs' ID and their time instants to transmit data packets is introduced to the traditional IEEE 802.11 CTS packet, as illustrated in Fig. 3. Based on the inferred number the STAs involving collisions (denoted as n), the AP sets the ''Duration'' field of the CTS packet as NAV CTS = t SIFS + t TXOP and broadcast the CTS packet to all the STAs, where t TXOP = n (t DATA + t SIFS + t ACK ) denotes the scheduled period of the transmission opportunity (TXOP). On receiving the CTS packet, all the STAs decode the ''Scheduling Info'' field. If the STAs are not scheduled to transmit data packets, they set their network allocation vector (NAV) according to the ''Duration'' field of the CTS (i.e., NAV CTS ) and keep silent during this period. Otherwise, the STAs transmit data packets according to the scheduled time instants.
The reason why the number of STAs can be predictable and the STAs are identifiable is that the spectrum usage identification can be considered as a multi-class classification problem, which can be well solved by training a deep neural network [27], [28]. In this paper, the AP only needs to identify the number and ID of the STAs involving collisions. As a result, the scalability of the proposed CNN based MAC protocol mainly depends on the average number of STAs suffering collisions. It is known that in the practical WLAN scenarios, the data traffic of the STAs usually follows the Poisson distribution, which indicates that the MAC buffer of the STAs is not always full and the number of STAs transmitting at the same time is far less than N . Therefore, high classification accuracy can still be achieved by the well-trained CNN model.
To better understand this, an intuitional example is presented with two STAs, denoted as S 1 and S 2 . In this particular example, since we have a controlled environment by setting up various transmission scenarios, we can collect the RF traces, which includes all the combinations of STAs in the network, and we label the data using the ground truth. In this example, a dataset including 4 different coexisting transmission scenarios (i.e., 'Idle channel', 'STA 1 waveform only', 'STA 2 waveform only', and the 'Combined waveforms') are collected from the testbed. We consider that the RF traces collected in our USRP2 testbed consists of Inphase (I) and Quadrature (Q) signals in a matrix form, which includes sophisticated features of the wireless signals. Similar to the structure of images consisting of pixels in a matrix form, the deep neural networks (especially CNN) are generally the preferred methods extracting and learning the higherlevel information hidden in the RF traces. Based on this, the CNN classifier is trained offline with the historical RF traces collected from the four scenarios above until it is able to learn the features from the RF traces and make the reasonable inference. Therefore, this results in a 4-class classification problem, where the four classes are detailed as below.
• Class-1: 'Idle'. None of the STAs is transmitting, i.e., the collected RF dataset is only from the noise floor measurements.
• Class-2: S 1 . Only S 1 is transmitting, i.e., the collected RF dataset is from the noise floor measurements plus the signal from S 1 .
• Class-3: S 2 . Only S 2 is transmitting, i.e., the collected RF dataset is from the noise floor measurements plus the signal from S 2 .
• Class-4: S 1 + S 2 . Both of the two STAs are transmitting simultaneously, i.e., the collected RF dataset is from the noise floor measurements plus the overlapped signal from S 1 and S 2 .

3) SCHEDULED DATA TRANSMISSIONS
On receiving the CTS packet, the STAs which sent the RTS packets first check the ''Scheduling Info'' field of the CTS packet and then identify the ID of the scheduled STAs and the data transmission time instants. The other STAs just set the NAV according to the duration field of the CTS, which is detailed in the previous subsection. Based on the scheduling information of the CTS packet, the scheduled STAs transmit their data packets according to the scheduled time instants. When the scheduled TXOP period finishes, all the STAs start to contend to reaccess the channel. Note that to achieve the backward compatibility with conventional IEEE 802.11 DCF, the STAs involved in the RTS collisions also increase their backoff stages until reaching the maximum value.

IV. PERFORMANCE ANALYSIS
The SL-MAC protocol design presented in Section III-B can ensure a high overall throughput by identifying the STAs involved collisions with the trained CNN model. However, the achieved throughput of SL-MAC tends to be degraded due to the inference errors introduced by the trained CNN model, especially when the number of STAs involving collision is large. In this section, we first analyze the upper bound of throughput gain, where the inference errors do not occur. After, we investigate the impact of inference errors on the achieved throughput of SL-MAC protocol.

A. UPPER BOUND OF THROUGHPUT GAIN
Compared to the conventional IEEE 802.11 DCF scheme, the throughput performance gain introduced by the deep learning is analyzed, where the inference error of the pretrained CNN is assumed to be neglected. In this case, the analyzed gain becomes an upper bound. According to Bianchi's Markov model [29], the saturation throughput of the conventional IEEE 802.11 DCF is where E[P] denotes the average size of the data packet payload, σ denotes the duration of empty slot time. P tr denotes the probability that there is at least one transmission in the considered slot time and P s represents the conditional probability that a transmission occurring on the channel is successful. Besides, T s = t DIFS + t RTS + t CTS + t DATA + t ACK + 3t SIFS is the average time the channel is sensed busy due to a successful transmission, and T DCF c = t DIFS + t RTS is the average time the channel is sensed busy by each device during a collision following the IEEE 802.11 DCF scheme.
Even though the collision occurs in our proposed SL-MAC protocol, the channel can be still utilized by scheduling the STAs' data transmissions within a TXOP, as shown in Fig. 2. Therefore, the saturation throughput of SL-MAC (abbreviated as φ DM ) can be obtained as where n denotes the average number of STAs involved in RTS collisions. T DM c denotes the average time the channel is utilized by scheduling STAs' transmissions when a collision is detected for the proposed SL-MAC protocol.
According to Fig. 2, T DM c can be calculated as where θ denotes the inference delay with the pre-trained CNN. t TXOP is the time period of the TXOP including scheduled n data transmissions, i.e., t TXOP = n(t DATA + t SIFS + t ACK ).
As a result, we can obtain According to (1), (2) and (5), the upper bound gain brought by SL-MAC can be calculated as Remark 1: Suppose that the inference error of the pretrained CNN is not considered, it can be seen from (6) that the gain brought by the pre-trained CNN is proportional to the number of STAs involved in the RTS collisions (i.e., n). In practice, the inference error of the multi-class classification problem increases with an increase of n. Therefore, there exists a trade-off between performance gain (η) brought by deep learning and the inference accuracy.

B. ACHIEVED THROUGHPUT WITH INFERENCE ERRORS
To characterize the impact of the inference error on the achieved throughput of SL-MAC, the definition of the inference error rate is introduced as follows.
Definition 1 (Inference Error Rate): Inference error rate introduced by trained CNN model is defined as the ratio of the number of correct inference to the total number of inference, i.e., γ = N correct N total , and γ ∈ [0, 1]. Based on Definition 1, the throughput of SL-MAC is calculated as where φ w/e DM denotes the throughput of SL-MAC without inference error, i.e., φ . φ e DM denotes the throughput of SL-MAC with inference error rate γ .
To calculate φ e DM , two typical cases are considered: 1) over-estimation, and 2) under-estimation. Specifically, if over-estimation happens, the inference results include not only the STAs involved collisions but also the other STAs not being collided. Denote the number of users inferred by the CNN as n, then n > n holds and n transmissions are scheduled, which leads to the channel resources waste in this case. If under-estimation occurs, the inference results may miss one or more users involved collisions, i.e., n < n. As a result, just a portion of the users involving collisions can be scheduled to transmit, which, however, degrades the fairness of each device.
Denote the achieved throughput in two cases as ψ over and ψ under , respectively. Then we have where α ∈ [0, 1] is the coefficient representing the probability that over-estimation occurs. Similarly, 1 − α is the probability that under-estimation occurs. In the following, we analyze the achieved throughput under the two cases.

1) CASE 1: OVER-ESTIMATION
In this case, the number of data transmission opportunities scheduled is larger than the true value, i.e., n over > n, n over is the number of users suffering collisions inferred by the CNN with over-estimation. According to (2), the achieved throughput in case 1 can be obtained as where T over DM denotes the average length of a slot time when over-estimation happens, T DM −over c is the average time the channel is utilized by scheduling STAs' transmissions when a collision is detected for the proposed SL-MAC protocol under the case 1.
In this case, T DM −over c is calculated as where t over TXOP is the time period of the TXOP including scheduled n over data transmissions, i.e., t over TXOP = n over (t DATA + t SIFS + t ACK ).
Proposition 1: Compared to the perfect inference case where the inference errors do not consider, the performance loss (denoted as θ ∈ (0, 1)) introduced by over-estimation can be calculated as

2) CASE 2: UNDER-ESTIMATION
In this case, the number of data transmission opportunities scheduled is smaller than the true value, i.e., n under < n, n under is the number of users suffering collisions inferred by the CNN with under-estimation. According to (2), the achieved throughput in case 2 can be obtained as where where t under TXOP is the time period of the TXOP including scheduled n under data transmissions, i.e., t under TXOP = n under (t DATA + t SIFS + t ACK ).
Proposition 2: Compared to the perfect inference, the performance loss introduced by under-estimation can be calculated as Remark 2: On one hand, it is observed from (11) that the worse inference results the larger performance loss when over-estimation happens. That is to say, a larger n over can lead to a smaller value of θ over . Therefore, the over-estimation deteriorates the system throughput of the SL-MAC protocol. On the other hand, when under-estimation occurs, we can infer from (14) that the impact of inference errors on system throughput performance of SL-MAC is not decisive, i.e., the total throughput may decrease or remain the same. However, since only a portion of STAs suffering collisions can be scheduled to transmit data within the TXOP, the remaining STAs not being scheduled will double the contention window size following the traditional CSMA/CA scheme. In such a case, it becomes more difficult for them to access the channel to lead to fairness problem.
Substituting (9) and (12) into (8), the achieved throughput with inference errors is calculated as (15), as shown at the bottom of this page. Then, by substituting (15) into (7), the system throughput can be achieved as (16), as shown at the bottom of this page, where γ denotes the inference error rate and α is the probability that over-estimation occurs.

V. CNN FRAMEWORK DESIGN A. OVERVIEW OF THE CNN ARCHITECTURE
To identify the collisions in the proposed MAC protocol, a CNN framework is proposed to predict the number and the ID of the STAs involving collisions by offline training a large set of labeled dataset. 1 As illustrated in Fig. 4, the CNN framework includes a master-CNN model inferring the number of STAs and N − 2 slave-CNN models identifying the ID of the STAs accordingly. Specifically, in the trained CNN models, considering that CNN is typically suitable for processing grid-like data, e.g., 1-D grid time-series data and 2-D grid image data [30], so the collected RF traces can be fed into the convolutional layer with data preprocessing. The collected I and Q samples are reshaped into 4-dimensional tensor suitable for Keras convolutional layer.
The proposed CNN structure is illustrated in Fig. 4, where each Convolutional layer is followed by a Rectified Linear Units layer (denoted as ''Conv + ReLU ''). For the feature extraction, the pattern of the master-CNN and slave-CNN models is based on [Conv + ReLU ] × K 1 M and [Conv + ReLU ] × K 1 S , respectively. After this, total of K 2 M and K 2 S fully-connected (FC) layers are used for the master-CNN model and slave-CNN model to process the flattened matrix and then classify the signals using Softmax activation 1 It is worth noting that several methods have been proposed to scale up deep neural network training across graphics processing unit (GPU) clusters [33], which helps to reduce the runtime of the offline training.   The slave-CNN model infers the ID of the STAs involving collisions, and the output is a C n N × 1 vector indicating the probability of each class. Finally, Adam optimizer [31] is used to optimize the CNN models, and the CNN based predictor is trained offline until it can learn the features from the RF traces and make the reasonable inference from the overlapped signals. The CNN framework in the proposed MAC protocol consists of three aspects: data collection, offline training, and online inference, as illustrated in Fig. 5. It worth noting that the data collection and offline training are performed only once. After the CNN models are trained offline, we can achieve inference online once given an I/Q dataset.

B. DATA COLLECTION
During the data collection, we collect the RF traces with a constant SNR using our USRP2 testbed, which is wired connected to a host PC (e.g., a laptop) with an implementation of the GNU Radio [32], as shown in Fig. 6(a). The data collected with a constant SNR is valid because the closed-loop power control is generally used to obtain a constant received power at the AP. Specifically, for the device side, the laptop is mainly responsible for baseband processing while the universal software radio peripheral (USRP2) focuses on the upconversion, digital-to-analog (D/A), and transmitting from the wireless radio. For the AP side, the USRP2 module first receives signals from the radio, then performs A/D and down-conversion. After that, the laptop receives the signals from the USRP2 via Ethernet and carries on the baseband processing. Finally, the I/Q sequences can be stored as a file at the laptop. examples from I and Q samples after reshape, each consisting of w (w is the window size) time-series samples. Besides, N channels = 1 similar to RGB values in imagery, Dimension 1 = 2 holding for our I and Q channels, and Dimension 2 = w. In this paper, the window size (w) is set to be 32, 128, and 512, respectively. With the collected data traces (i.e., I and Q samples), we train on 80% of collected RF data set (training set). It contains about 790 million I and Q samples, validate on 10% of data set (validation set) and test on 10% of data set (testing set) each corresponding to about 100 million of the I and Q samples.

C. OFFLINE TRAINING AND ONLINE INFERENCE
The CNN based predictor is offline trained based on the historical radio frequency (RF) traces. Then the pre-trained CNN predictor can be deployed at the AP and used to indicate the STAs from the overlapped signals, as highlighted in Fig. 5. Specifically, we first collect the RF traces with our USRP2 testbed within a relatively large window size. Suppose that the window size is w, the dimension of each VOLUME 8, 2020 RF sample is 2 (i.e., In-phase (I) and Quadrature (Q) signals), then the dimensionality of the input space equals to 2w. After, the CNN predictor is trained and tested offline based on the collected historical RF traces. During the offline training, back-propagation is performed to train the CNN model. It is worth noting that we use the Graphics Processing Unit (GPU) cluster (i.e., NVIDIA DGX-1) with TensorFlow installed to accelerate the offline training process [33], which only needs to be executed once, as highlighted in Fig. 6(b). Considering that the ground truth is known to us, the identification of STAs can be considered as a multi-class classification problem which belongs to the supervised learning. After the offline training, the online RF traces can be fed into the pre-trained CNN predictor deployed at the AP to identify the STAs from the overlapped signals in near-real-time.
Specifically, for the multi-class classification problem, the probability of each class is predicted using the Softmax function. The predicted probability for the i-th class is given as where M is the total number of classes, z is the output of the last fully connected layer. Then, we conventionally set the loss function of the classification (denoted as L) as cross-entropy [34], which is given as where x i is the i-th input data sample, y i denotes the corresponding groundtruth and f (x i ) is the actual output of neurons.
The two-step offline training and online inference are illustrated with an example with a total of four STAs existing in the network, as shown in Fig. 7. During the first-step training, the master-CNN model detecting the total number of STAs involved collisions is trained offline based on the whole RF traces, and the inference result falls into one of the three classes. Note that the class-3 has a fixed number of STAs (i.e., all the STAs in class-3). Therefore, we only need to indicate the STAs' ID in the remaining two classes, i.e., class-1 and class-2. In the second-step training, three slave-CNN models are trained separately offline based on different RF traces. Compared to the conventional CNN training that includes 16 classes, the accuracy of the proposed two-step CNN training can be improved due to the decrease in the number of classes.

D. IMPLEMENTATIONS
The implementation of the proposed CNN predictor that includes one master-CNN model file and two slave-CNN model files is illustrated in Fig. 8. In the CNN predictor, the AP first infers the number of STAs involved collisions according to the new RF traces via the pre-trained master-CNN model. After this, based on the previous reference result (i.e., the number of STAs), the AP selects a slave-CNN model accordingly and then performs the inference to identify the STAs' ID. For the implementation of the proposed CNN-based MAC, the complexity and generalizability are discussed as below.

1) TIME AND SPACE COMPLEXITY a: TIME COMPLEXITY
The CNN model that is selected to perform the inference in our proposed MAC has the quadratic time complexity, i.e., O(M 2 L), where L is the number of layers, M is the number of neurons in a hidden layer which indicates the scale of the neuron network model. Specifically, when the window size (w) is set to be 32, 128, and 512, the corresponding training time is 685 mins, 414 mins, and 243 mins, respectively. Besides, we only need to train the CNN model once, which can be performed offline via machines with strong computing and storage capabilities, e.g., the GPU clusters.

b: SPACE COMPLEXITY
After the offline training, the total size of the pre-trained CNN model is less than 5 MB, which is far less than the storage (even the memory) of the AP. This indicates that the AP has enough space to save and even cache all the pre-trained CNN model files into memory to perform the online inference more efficiently.

2) GENERALIZABILITY
The generalizability of the proposed SL-MAC protocol is exactly one of the focuses of this paper. Considering that the wireless channel environment is complex and time-varying, in such a case, the received signals over the wireless channel are usually with different SNR values. To learn inherent significant features of the wireless channel, we collected a large amount of RF traces via the real wireless channel across a wide SNR range (from 0 to 20 dB), aiming to cover many different scenarios.
In this paper, the proposed CNN based MAC protocol can work well in the environments where the wireless channel environment is complex and time-varying (e.g., the STAs and pedestrians move) because the CNN model is trained using a wide range of parameter settings. Although the input parameters do not include every possible combination of the parameters, they do cover a wide range of parameter settings, and the well-known generalization property of machine learning models will enable the trained CNN to produce accurate inference even for parameter settings not included in the training samples. As proved by the probably approximately correct (PAC) learning, generalization can be achieved by designing a proper CNN model with enough training datasets [35].
By training the deep learning models based on our collected RF traces, the deep neural networks can extract the inherent features from the overlapped signals with different SNR values (i.e., different channel conditions). This demonstrates that the nonstationary nature of the wireless environment has been considered in our experiment, and the proposed method can generalize well to different scenarios where the received signals are with different SNR values. Furthermore, in this paper, we use the hold-out validation method to avoid over-fitting by dividing the dataset into training, validation, and testing data. Therefore, the proposed SL-MAC with pre-trained CNN model can generalize well to a new environment or at different positions because of the nonstationary nature of the wireless environment has been learned well.

A. TESTING RESULTS OF CNN
The hyper-parameters of the pre-trained CNN model files are summarized as follows. For the master-CNN model detecting the number of STAs and the slave-CNN model-1, they contain three convolution layers and one fully connected (FC) layer, i.e., K 1 For the slave-CNN model-2, it contains four convolution layers and one fully connected (FC) layer, i.e., K 1 S = 4, K 2 S = 1. The average inference accuracy is presented in Table 1, where four STAs are taken as an example, and the window size is set to be w = 128. It can be seen from Table 1 that all of the pretrained CNN models have achieved a relatively high inference accuracy (i.e., ≥ 90%). Therefore, the inference error of the pre-trained CNN is reasonably neglected in the following simulations.

B. PERFORMANCE EVALUATION 1) SIMULATION SETTINGS
The simulations are carried out using the ns-2 simulator [36] according to the PHY and MAC layer parameters that are presented in Table 2. To integrate with the experimental results of deep learning and the ns-2 network simulations, we consider the inference error rate (η) as an input to the ns-2 simulations. Specifically, we collect the RF data using our USRP2 testbeds and perform the testing via the pretrained deep learning model. Since the deep learning experiments have been done in TensorFlow, we use a Log-Sigmoid function η = D 1+Ae (B−CN ) to portray the relationship between η and N according to the inference results given by the CNN [19]. Therefore, if the total number of devices N is given, η can be obtained by mapping accordingly and then fed into the ns-2 simulations. We consider a general star topology VOLUME 8, 2020  network scenario with a totally N STAs uniformly distributed within the AP's converge radius of 50 m. We assume that the inference time cost is ignored (i.e., θ = 0), and each STA is under the saturation traffic with the same data payload size. To demonstrate the superiority of the MAC efficiency improved by the deep learning, we implemented and compared the performance of SL-MAC with the conventional IEEE 802.11 DCF protocol using a four-way handshake, i.e., RTS-CTS-DATA-ACK. The simulation results are averaged from 100 runs.

2) PERFORMANCE EVALUATION
In Figs. 9-10, we compare the normalized throughput of the proposed SL-MAC protocol with IEEE 802.11DCF to evaluate the upper bound of the performance gain brought by deep learning, where the inference errors of SL-MAC are ignored. Fig. 9 presents the normalized throughput against the number of devices (N ). First, it is observed from Fig. 9 that the analysis of SL-MAC matches well with the simulation. The throughput of the proposed SL-MAC protocol increases and then keeps stable as N increases while the throughput of the IEEE 802.11 DCF severely decreases. This is due to the severe collisions occur using CSMA/CA, but the collisions can be avoided with the proposed SL-MAC based on deep learning. Suppose that the inference error is neglected, it can be seen from Fig. 9 that a larger N can lead to more significant normalized throughput of SL-MAC. This demonstrates that the proposed SL-MAC protocol is suitable for dense WLAN scenarios.
Moreover, it is observed from Fig. 9 that when the data payload size is relatively large (e.g., 1024 bytes in Fig. 9(b)), RTS/CTS handshake scheme can help to reduce the time spent during a collision, with respect to the basic access mechanism. However, when the data payload size is relatively small (e.g., 32 bytes in Fig. 9(a)), RTS/CTS handshake degrades throughput performance due to the control overhead. Therefore, Fig. 9 demonstrates that the normalized throughput gain brought by deep learning is decreased when the data payload size increases. This phenomenon is also verified by Fig. 10. The main reason is that when the data payload size is relatively large, the time ratio occupied by the control handshake is reduced, and thus the total time with RTS collisions resolved by the deep learning is decreased. Therefore, the advantage of SL-MAC declines for the larger data payload size. Besides, given a data payload size, we find that the more devices existing in the network, the more improvement can be achieved by the proposed SL-MAC, which verifies the Remark 1. This is because when the number of devices increases, more collisions will occur, which is exactly what the proposed SL-MAC expected. Only when the collision occurs, the AP can get the opportunity to schedule the STAs involved in collisions to transmit their data packets within the TXOP. To better understand the benefit of SL-MAC even in the absence of inference error, we define the user density as the ratio between the total number of STAs and the AP's coverage area, i.e., ρ = N χ , where χ = πr 2 and r denotes the radius of the AP coverage area. ρ denotes the number of STAs deployed per m 2 . Taking a radius of 50 m as an example, we evaluate the probability of RTS collision, as shown in Table 3. It is observed that the RTS collision probability increases significantly with the increase of user density. In particular, when 25 STAs are deployed, i.e., the user density is only 3.18×10 −3 , the RTS collision probability has exceeded 50%. Furthermore, if scenarios with dense deployment are considered, e.g., stadium, train station, or conference room, the user density may usually reach up to 1 STA per m 2 , i.e., at least one STA is deployed with 1 m 2 [37]. In this case, the RTS collision probability reached almost 100%, which leads to the failure of the traditional 802.11 DCF scheme. Therefore, the proposed SL-MAC can explore promising advantages of deep learning in wireless networks, especially with dense deployment scenarios.
Figs. 11-12 evaluate the impact of the inference error rate (γ ) of a trained CNN model on the normalized throughput of the proposed SL-MAC protocol, where the over-estimation probability (α) is set to be 0.5 and 0.1, respectively. It is observed from Fig. 11 and Fig. 12 that the normalized throughput of SL-MAC protocol decreases with an increase of γ . This is because that when γ becomes larger, more inference errors occur, which degrade the achieved throughput. Moreover, we can observe that the decline of normalized throughput with α = 0.5 is more significant than that of α = 0.1. The reason for this outcome is apparent since the larger value of α can cause a higher probability of over-estimation, which leads to throughput degradation, as presented in Remark 2.
Figs. 13-14 evaluate the impact of the over-estimation probability (α) on the normalized throughput of the proposed   SL-MAC protocol, where the inference error rate (γ ) is set to be 0.5 and 0.1, respectively. It can be observed that when γ is given, the normalized throughput of SL-MAC protocol decreases with an increase of α because of the higher probability of over-estimation. Furthermore, compared to Fig. 13  with γ = 0.5, it is observed from Fig. 14 that the normalized throughput almost keeps unchanged as α increases when γ is set as 0.1. This is because that a larger value of γ leads to a larger inference error rate. In such a case, as α increases, more over-estimations occur when γ = 0.5, which can seriously degrade the throughput performance of SL-MAC. Moreover, we can see from Figs. 11-14 that with the increase of data payload size, the throughput gain (i.e., the normalized throughput gap) decreases because that the inference errors can introduce more damage to the SL-MAC with larger payload size.

VII. CONCLUSIONS AND FUTURE WORK
In this paper, we propose a novel MAC protocol for future WLANs. Because of the severe collisions that occur using the traditional CSMA/CA scheme, spectrum learning-powered MAC protocol (SL-MAC) is proposed to schedule the STAs' data transmissions who involved in the RTS collisions. An essential feature of the proposed SL-MAC protocol is the backward compatibility with the conventional IEEE 802.11 DCF mechanism. Then, both of the superiority and inferiority brought by the CNN predictor are analyzed, which demonstrates the necessity of the potential applications of deep learning to the MAC design. Extensive simulations demonstrate the advantages of the proposed SL-MAC protocol. This paper is the first attempt to integrate fundamental MAC layer design with deep neural networks in a conventional IEEE 802.11 DCF setup to avoid introducing additional hardware overhead and aim to lay a foundation for further related research.
Our potential future works are listed as follows. How to generate new RF training datasets in practical scenarios and retrain the CNN models more efficiently is regarded as one of our future work, known as the ''scalability'' issue. For example, when new STAs join to the WLAN, this impacts on the inferring results of the slave-CNN models (i.e., the ID of the STAs involving collisions). It would be difficult to collect enough labeled RF datasets to retrain the CNN models since there exist too many combinations in real environments.
To mitigate this issue, we can take advantage of the historical labeled data and combine it with some unlabeled data. 2 In this context, to achieve high inference accuracy of the CNN models, fine-tuning the slave-CNN models with the collected combined RF dataset (some of the RF datasets is unlabeled) becomes promising. Since semi-supervised learning (SSL) makes use of unlabeled RF traces to facilitate the learning process and transfer learning (TL) can learn generalizable representations in source domain [38], the SSL and TL could be combined in our future work to yield significant inference improvements on the spectrum learning.

ACKNOWLEDGMENT
The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Office of the Under Secretary of Defense for Research and Engineering (OUSD(R&E)) or the U.S. Government.