Journals & Magazines >IEEE Communications Magazine >Volume: 62 Issue: 11

RF Jamming Dataset: A Wireless Spectral Scan Approach for Malicious Interference Detection

Abstract:

The evolution of next-generation communication systems demands that wireless networks possess the attributes of awareness, adaptability, and intelligence. Wireless sensin...Show More

Metadata

Abstract:

The evolution of next-generation communication systems demands that wireless networks possess the attributes of awareness, adaptability, and intelligence. Wireless sensing techniques provide valuable information about the radio signals in the environment. However, hostile threats, such as jamming, eavesdropping, and manipulation, pose significant challenges to these networks. This article presents a comprehensive study of an innovative RF-jamming detection testbed designed to combat these threats. The testbed leverages the spectral scan capability of the wireless network interfaces and the jamming toolkit, JamRF, to accurately detect and mitigate jamming attacks. This study outlines the methodology used to develop the testbed, along with a detailed discussion on the rationales behind the design decisions. The accompanying RF jamming dataset, which comprises experimentally measured data, is expected to promote the development and evaluation of jamming detection and avoidance systems. As a proof-of-concept, we trained five different machine learning algorithms and achieved a jamming detection accuracy of over 90% for all algorithms. The proposed RF jamming dataset and testbed represent a significant advancement in the fight against malicious interference in wireless networks.

Published in: IEEE Communications Magazine ( Volume: 62, Issue: 11, November 2024)

Page(s): 114 - 120

Date of Publication: 22 January 2024

ISSN Information:

DOI: 10.1109/MCOM.003.2300483

Contents

Introduction

Ensuring the security of communication networks is of utmost importance. While wired networks have been targeted by various types of attacks, the widespread adoption of wireless networks in recent years has made them a prime target for malicious activities. However, advances in technology have made wireless networks more affordable and easier to deploy, making them a popular choice for many organizations. Despite their popularity, wireless networks are known to be more susceptible to security attacks compared to wired networks due to the nature of wireless links [1]. The openness of the wireless medium makes it susceptible to both intentional and unintentional interference, with interference from neighboring cells being a prevalent form of unintentional interference in a wireless communication system. On the other hand, intentional interference refers to malicious attacks on a victim receiver that is not equipped to defend itself [2]. One such attack is the jamming attack, which actively transmits high energy to disrupt reliable data transmission or reception and can severely impact system performance [3].

To mitigate the impact of jamming attacks, researchers in academia, industry, and government, have dedicated significant effort to developing jamming detection and avoidance techniques. To facilitate these efforts, various datasets in different formats have been made available to the public to aid in the creation of jamming detection and avoidance systems. Puñal et al. [4] created a comprehensive dataset that includes multiple trace sets of 802.11p communications under different Radio Frequency (RF) jamming conditions. The RF jammer's operation patterns were analyzed, including constant, reactive, and pilot jamming. The observations were conducted in an anechoic chamber and outside in two key outdoor environments: an open area with a straight road and a densely populated building environment. Whelan et al. [5] collected another comprehensive WiFi traces data that comprises logs from a regular flight of an unmanned aerial vehicle (UAV) as well as one in which the UAV is subjected to global positioning system (GPS) spoofing and jamming. The experiment utilized a signal generator to precisely locate the UAV in Shanghai, China. GPS spoofing was achieved using the HackRF software-defined radio (SDR) and the GPS-SDR-SIM application to transmit the UAV's coordinates. GPS jamming was accomplished by broadcasting white Gaussian noise using the HackRF.

Although WiFi traces data is valuable for making inferences about network channel state, it does not provide a complete picture of the utilization and condition of the entire spectrum. Moreover, the traces obtained are in packet form, representing samples at the network layer, necessitating a packet sniffer for analysis. To address the limitation of WiFi traces data in providing a comprehensive view of spectrum utilization and conditions, various monitoring systems have been proposed and made accessible in the literature. Prominent among these datasets include the Google Spectrum [6] for television white-space measurements, the IBM Horizon project [7] that presents a decentralized architecture for sharing Internet of Things (IoT) data, and Microsoft's Spectrum Observatory [8], which enables spectrum sensing through the use of high-end sensors. The focus of Google's Spectrum and IBM Horizon on specific use cases results in their limited scope, while the high cost of the necessary sensing nodes impedes widespread deployment of Microsoft's Spectrum Observatory. In [9], ElectroSense was proposed as a flexible and cost-effective testbed that leverages low-cost sensors to collect and analyze spectrum data through a crowd-sourcing paradigm. The primary goal of the initiative is to sense the full spectrum in diverse locations world-wide and provide processed spectrum data to users seeking a comprehensive understanding of spectrum utilization.

The primary goal of the initiative is to sense the full spectrum in diverse locations worldwide and provide processed spectrum data to users seeking a comprehensive understanding of spectrum utilization.

Despite the benefits of these spectrum sensing tools, they are either expensive or difficult to implement, motivating the need for more suitable spectral sensing tools. To that end, in this article, we present an experimental testbed for performing spectrum scanning using the in-built Wireless Local Area Network (WLAN) Interface Cards (NICs) of communication devices to collect data in different environments. This makes our testbed a more accessible and cost-effective solution for collecting RF jamming data. Furthermore, using the testbed, we employ JamRF a jamming toolkit developed in [3], to synthesize different jamming scenarios and generate an RF jamming dataset. We outline the methodology used for developing the testbed and discuss the reasons for the choices made during its development to facilitate future improvements in the experimental exploration of jamming dataset production based on spectral scans. Our measurement data is neatly labeled into categories, which can be utilized in RF jamming analysis. This dataset can assist researchers in wireless security to conduct experimental evaluations of existing and future jamming detection and avoidance systems. Additionally, we provide an example scenario that can be used to construct experiment-driven jamming and avoidance systems and suggest avenues for further study using this dataset.

The remainder of this article is structured as follows. The design of the proposed experimental setup is outlined in the next section, including the underlying principles and practical implementation of the testbed. The sample dataset obtained from the testbed is presented following that. Then an example application of the dataset is demonstrated. The article concludes in the final section.

Testbed Design and Implementation

In this section, we present the design and implementation of the testbed used for the measurement and analysis of the jamming signal generated by the JamRF toolkit [3]. JamRF is a software toolkit developed using GnuRadio to interface with the HackRF SDR. We provide a comprehensive discussion of the testbed design below, followed by details the implementation of the design, including the usage of a Raspberry Pi Compute Module 4, a WiFi Radio for Spectral Scan, and a HackRF Jammer.

Testbed Design Based on JamRF

The proposed experimental design utilizes the JamRF toolkit [3] to conduct a jamming attack on all available 2.4/5GHz channels. A constant jammer configuration with a Gaussian noise jamming signal was employed, with a bandwidth of 20 MHz and a sampling rate of 20 MHz. The experiments are performed in three environments: an RF isolation chamber, a laboratory, and an office. To avoid disrupting the transmission activities of other users, the jamming attack is carried out inside the RF isolation chamber. The attack is executed using a HackRF with JamRF, and the captured signals are recorded at the receiver side using a Compute Module 4 (CM4) with a mounted Qualcomm Atheros device (QC9880) in background mode scanning. The receiver is positioned at different distances in {20, 40, 60} cm from the jammer, and the jamming transmit power varies at {0, 5, 10} dBm. For each distance and power combination, Fast Fourier Transform (FFT) samples are collected for approximately three seconds, and the process is repeated ten times with a 10-second pause between each iteration. In three scenarios, no jamming attack is conducted. These scenarios are low activity in the laboratory, high activity in the office, and no activity in the RF isolation chamber. The collected measurement data is organized and labeled into categories for ease of RF jamming analysis.

Figure 1.

Overview of the Spectral Scan Testbed.

Show All

Implementation of the Testbed

The implementation process of the testbed is depicted in Fig. 1. This section presents a comprehensive overview of the experimental details and measurement methods involved in our implementation. In particular, we describe the types of WLAN interfaces considered in the experiments and specify the parameters of the spectral scan testbed. Furthermore, we discuss the hardware and software components utilized in constructing the testbed.

Raspberry Pi Compute Module 4: The CM4 is a raspberry Pi 4 compact form factor primarily designed for embedded applications. It features a quad-core ARM Cortex-A72 processor and dual video output, among other interfaces. For this experiment, we utilize the CM4 Input-Output (10) board, which serves as a development system for the CM4 and an embedded board for end products. The 10 board enables the construction of systems using off-the-shelf components such as HATs and PCIe cards, including those for NVMe, SATA, networking, or USB. The major user connectors are conveniently located on one side for ease of enclosure design.

Figure 2.

Overall Architecture of the Spectral Scan System.

Show All

WiFi Radio for Spectral Scan: In our experimental testbed, any of the two commercially available wireless modules, namely, the Qualcomm Atheros QCA9880 and Doodle Labs NM-DB-3U radio can be utilized. The Doodle Labs NM-DB-3U is based on the Qualcomm AR958x chipset and supports IEEE 802.11n and 3x3 MIMO. It is an industrial-grade module that interfaces via mini PCIe and is supplied by Doodle Labs. The Qualcomm Atheros QCA9880, on the other hand, is a dual-band 3x3 MIMO 802.11ac/abgn chipset that is also interfaced via mini PCIe. Both of these modules are capable of conducting spectral scans, as they are equipped with the ATH10k (drivers/ net/wireless/ath/ath10k/spectral.c) and ATH9k (drivers/net/wireless/ath/ath9k/common-spectral.c) wireless drivers, respectively, which are based on the mac80211 softmac architecture.

HackRF Jammer: In our experimental setup, we utilized the HackRF One, a wideband SDR half-duplex transceiver developed and manufactured by Great Scott Gadgets [10]. With the ability to both receive and transmit signals, this device supports frequencies ranging from 1 MHz to 6 GHz, with a maximum output power of up to 15 dBm, depending on the band. The HackRF One includes a sub-miniature version A (SMA) antenna port, SMA clock input and output ports, and a USB 2.0 port, making it compatible with popular software-defined radio applications such as GNU Radio and SDR. As outlined above, we employed JamRF, a jamming toolkit that implements various types of jammers and jamming strategies using the HackRF One and GNU radio [3].

Note that the ATH10k and ATH9k driver configurations do not automatically enable spectral scan by default. This required the specific activation of the CONFIG_ATH10K_SPECTRAL and CONFIG_ATH9K_COMMON_SPECTRAL features in the kernel configuration. To capture spectral data, an open-source tool(https://github.com/govindsi/utilities/blob/main/scripts/spectral_scan.sh) was utilized under various configurations(https://github.com/govindsi/utilities/tree/main/config/AP).

Data Set Organization and Characteristics

In this section, we provide a comprehensive overview of the data set accompanying this article. The dataset is made publicly available for researchers(https://ieee-dataport.org/documents/rf-jamming-dataset-using-cm4-and-jamrf-enabled-hackrf). Below we present the architecture of the Spectral Scan system used to generate the data set. Following that we elaborate on the features obtained from the FFT data. Then we provide a visual representation of the Spectral Scan results, including an illustration of the impact of jamming and jammer configuration on the RF spectrum. Finally, we categorize the data set based on the type of measurement and the parameters used in each experiment.

Spectral Scan System Architecture

The architecture of the spectral scan system is presented in Fig. 2. This system integrates multiple communication layers to facilitate spectral scanning functionality. The WPA_supplicant and hostapd components are utilized to configure the User Media Access Control (UMAC) mode, which can be set to access point, mesh, station, or independent basic service set modes. The spectral scan classifier is employed to classify the spectrum conditions, while the FFT_eval block is based on an open-source spectral scan pre-processing tool(https://github.com/simonwunderlich/FFTeval). The tool's userspace program provides a graphical representation of the Fast Fourier Transform (FFT) samples collected from Atheros NICs, thereby facilitating the development of open-source spectrum analyzers for Qualcomm Atheros AR92xx and AR93xx-based chipsets. The ATH10k/ATH9k SPECTRAL_SCAN_CTL driver is used for spectral scan configuration, with the spectral data being captured via the DEBUGFS interface and transferred to the WiFi firmware through the Peripheral Component Interconnect (PCI) transport.

Spectral Scan Features

The Spectral Scan is a feature offered by some commercial off-the-shelf (COTS) wireless chipset products, which enables the collection of FFT data from the physical layer through software-controlled means. The Spectral Scan can be divided into two categories: high-latency and low-latency scans. The FFT data collected from the spectrum can be stored in a binary file format, which can then be post-processed to create an open-source spectrum analyzer or interference classifier. The binary data file contains eight primary features: frequency, noise, max magnitude, total gain in dB, base power in dB, relative power in dB, average power in dB, and received power in dBm [11]. These features can be extracted from the Spectral Scan datagram header. The received power in dBm feature is calculated using the received power equation specified in the Qualcomm Atheros AR92xx and AR93xx chipset documentation. Following the parsing process, the data is stored as a comma-separated values (CSV) file, which can then be utilized for training machine learning (ML) algorithms. The CSV file provides time-series data derived from the in-phase-quadrature (IQ) samples binary file. The binary and CSV data files have been preserved and made accessible for reference in conjunction with this chapter.

Visualization of Spectral Scan Results

The impact of jamming on the RF spectrum is presented in Fig. 3. This graph illustrates the relationship between the received power (dBm) and frequency under various jamming configurations. For example, with jamming at 5200 MHz, a jamming power of 10 dBm, and a 20 cm distance between jammer and receiver, the received power peaks around 30 dBm at this frequency. Conversely, when jamming at 5280 MHz with the jammer 60 cm away and a jamming power of 0 dBm, the peak received power is approximately 0 dBm at 5280 MHz. These observations reveal that jamming effects are observable at surrounding frequencies and diminish with increasing distance from the jammed frequency. Notably, the generated dataset displays diverse spectral responses in different environmental settings, such as isolation chambers versus laboratories, each showcasing distinct signal power characteristics. Machine learning (ML) classifiers, with their capacity for advanced pattern recognition, emerge as essential tools for interpreting these subtle nuances.

Categorization of Dataset

This section presents the categorization of the dataset accompanying the article. Based on the type of measurement, such as device type, device band-width, and spectral scan method, experiments from approximately 30 different configurations have been selected and grouped into four categories as described earlier. The dataset comprises five sub-directories, each named according to three parameters in the format of spectral_scans_A_B_C, where A represents the device type (either QCA9880 or doodlelabs), B represents the scan bandwidth (either ht20, ht40, or vht80), and C represents the mode of scan (background, chanscan, or manual [11]).

Each sub-directory contains over a thousand samples, with filenames in the format of samples_A_B_C_D_E, where A represents the environment in which the data was collected (either chamber, lab, or office), B represents the jammed frequency, C represents the distance between the jammer and receiver (either 20 cm, 40 cm, or 60 cm), D represents the jammer transmit power (0 dBm, 5 dBm, or 10 dBm), and E represents the transmission number, starting from 1 and indicating the temporal order of the transmissions. Here, “A” in “samples_A_B_C_D_E” correlates with the interference classes in Table 1, with “chamber” indicating high interference, “lab” moderate interference, and “office” low interference. For example, the file “samples_chamber_2412MHz_40cm_5dbm_3. bin” indicates the third transmission with a jamming power of 5 dBm, a distance of 40 cm between the jammer and receiver, a jammed frequency of 2412 MHz, and data collected in an RF isolation chamber. For each configuration, ten transmissions were conducted.

Jamming Detection and Avoidance

In this section, we present a machine learning-based approach for determining the exposure of a transmitter and a receiver to RF jamming attacks. This jamming detection problem is framed as a binary classification task, with samples classified as either normal or jamming. Normal samples are acquired from laboratory, office, and isolation chamber environments without jamming, while jamming samples are collected from the isolation chamber with the JamRF toolkit turned on. To achieve high detection accuracy, it is crucial to carefully consider various aspects, such as the selection of appropriate input features, measurement and collection of data, generation of a large dataset, and application of efficient algorithms for training, validation, and testing of the model. In this article, we evaluate the performance of five different classifiers.

Figure 3.

Received power for two instances of jamming power/distance. The blue boxes represent jamming occurring at 5200MHz, while the red represent jamming occurring at 5280MHz.

Show All

The five classifiers evaluated in this article were Multi-Layer Perceptron (MLP), Support Vector Machines (SVM), RAndom Forest (RAF), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM). MLP is a feed-forward neural network with at least three node layers, including an input layer, hidden layer, and output layer, and utilizes the supervised learning approach of backpropagation. SVM is a supervised learning model that classifies fresh samples into two categories through a non-probabilistic binary linear classifier and is capable of performing non-linear classification via the kernel trick. RAF is an ensemble learning approach that utilizes many decision trees for classification, regression, and other tasks and has improved performance but decreased interpretability compared to a single decision tree. XGBoost is an optimized gradient-boosted decision tree solution that prioritizes speed and performance and often outperforms a single decision tree in terms of accuracy while compromising interpretability. LightGBM is a scalable, distributed gradient boosting system that supports multiple algorithms, including RAF, and differs in tree construction compared to XGBoost.

Pre-Processing

Here we describe the steps to prepare the data before training the classifiers. Firstly, the feature “freq” was dropped as it is deemed irrelevant to the task of jamming detection on a particular (isolated channel). The time series data for each channel was then transformed into a single row by applying seven descriptive statistics, including minimum, maximum, mean, standard deviation, 75th%ile, 50th%ile, and 25th%ile, resulting in 49 features. Finally, the data were separated into two groups; one consisting of jamming data and the other consisting of data from isolated chambers (low interference), offices (moderate interference), and laboratories (high interference). These groups were used to train a binary classifier. However, the four independent classes were kept separate for the purpose of training a multiclass classifier.

Table1. Jamming detection performance in binary classification with normal vs. jamming, where normal class comprises low, moderate, and high interference samples. The results are in the format mean (± Std.) obtained over 10-folds.

Table - Jamming detection performance in binary classification with normal vs. jamming, where normal class comprises low, moderate, and high interference samples. The results are in the format mean (± Std.) obtained over 10-folds.

Training and Tuning

The ML-based classification algorithms are trained and evaluated using the measured dataset. After the pre-processing step, the training split of the processed dataset is utilized to train and fit the models. The performance of the models is then tested and presented using the testing dataset, which contains normal/interference and jamming data. To achieve high performance, a random search hyper-parameter tuning technique with 10-fold cross-validation is employed. During this tuning, a predefined range of values for each hyperparameter was explored to identify the opti-mal settings, effectively serving as the upper and lower bounds for the parameters.

The hyperparameters of the MLP classifier are optimized to obtain the optimum model. In the case of binary classification, the best hyperparameters include Adam solver, an initial learning rate of 0.0001, a batch size of 128, $l_{2}$ regularization factor of 0.001, a maximum iteration of 300, and two hidden layers with thirty and fifteen units respectively. For multi-class classification, the solver, initial learning rate, and maximum iteration are similar to those for binary classification. However, the batch size is 128, the $l_{2}$ regularization factor is 0.0001, and there are two hidden layers with 30 and 15 units, respectively. In the SVM classifier, three major hyperparameters must be tuned for optimal performance: kernel, $C$ , and $\gamma$ . The optimal hyperparameters for binary classification were an RBF kernel, $C= 20$ , and $\gamma=0.0001$ . For multi-class classification, the kernel is also RBF with $C=35$ and $\gamma=0.001$ . For the random forest classifier, six hyperparameters were adjusted. The ideal hyperparameters for binary classification include 200 estimators, minimum samples leaf of 1, maximum depth of 5, minimum samples split of 2, maximum features set to $\sqrt{n}$ where $n$ is the number of features, and bootstrap set to false. In the case of multi-class classification, the maximum features and bootstrap values are identical to those for binary classification. However, the number of estimators is 100, the minimum leaf samples is 2, the maximum depth samples is 30, and the minimum split samples is 4.

Deployment Phase

The ML-based classifiers are employed to detect and mitigate jamming in IoT networks, as illustrated in the deployment phase of Fig. 4. Initially, a spectral scan assesses a specific channel $(channel_{i})$ to determine its status. The classifiers then predict the jamming probability $p_{i}(jam)$ . Using a threshold $C$ from the classifiers, which was optimized during training, a decision about the channel's status is made. If $p_{i}(jam)$ is below $c$ , transmission starts, followed by an idle period of $T$ seconds. If channel_iis jammed, the algorithm evaluates all channels in the band. It selects either the channel with the lowest jamming probability or explores other bands. The selection process prioritizes channels with the lowest jamming likelihood, iterating until data transmission completes.

Results and Discussions

Table 1 compares the performance of different machine learning algorithms (MLP, SVM, RAF, XGboost, and LGBM) for binary-class classification of jamming. The evaluation metrics used are precision, recall, F1-score, accuracy, and inference speed (KHz). To handle imbalanced datasets, it is important to calculate these metrics separately for each class. This allows for a more nuanced understanding of the performance of the algorithms, as the imbalance in the dataset can affect the overall accuracy and make it misleading. The precision, recall, and F1-score metrics provide a more complete picture of the performance of the algorithms, highlighting the trade-off between correctly classifying the samples of each class and the number of false positive or false negative predictions. For binary-class classification, the results showed that all algorithms performed well, with precision and recall ranging from 94% to 100% and F1-scores ranging from 95% to 100% for all classes. In terms of accuracy, RAF and SVM exhibit the highest performance achieving an accuracy of 100%. On the other hand, Table 2 compares the performance of the same machine learning algorithms for multi-class classification of jamming and low, moderate, and high interference. The results showed that RAF performed best with a 96.33% F1-score and an accuracy of 99%. MLP showed a lower performance, with F1-scores ranging from 56% to 96% and an accuracy of 90.33%. This suggests that RAF is able to learn the features that distinguish jamming from normal operation and different types of interference more effectively than the other algorithms.

In addition to accuracy, the speed of inference has also been studied. To this end, over 2000 samples from the dataset were generated and the average inference speed per second was calculated. When the algorithms were run on a 32GB RAM CPU with a dual-core process Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz 2.59 GHz, the MLP classifier was the fastest among the five classifiers, with an average inference speed of 414.52 KHz, and 382.63 KHz for the binary and multi-classification scenarios respectively. The inference speed of the algorithms varied similarly to the binary-class results, with MLP being the fastest and RAF the slowest. In conclusion, all algorithms tested showed good performance in jamming detection for both binary and multi-class classifications, with RAF showing the best results in terms of accuracy and F1-score. The inference speed of the algorithms also varied, with MLP being the fastest. It should be noted that, although MLP was the fastest algorithm, it had the lowest accuracy. This suggests that MLP is not as good at generalizing to new data as the other algorithms. Furthermore, it is worth noting that the inference speed of the algorithms varied significantly. This suggests that the choice of an algorithm may be influenced by the specific requirements of the application, such as the need for real-time detection or the need to minimize the computational resources required. Overall, the LGBM offers the best trade-off between accuracy and speed for both binary and multi-class classification scenarios.

Figure 4.

Overview of the training and deployment phases of the proposed classifier-based anti-jamming approach.

Show All

Table 2. Performance comparison of jamming detection for multi-class classification. The results are present in the format of

$mean(\pm std.)$ obtained from 10-folds.

$Table 2.- Performance comparison of jamming detection for multi-class classification. The results are present in the format of $mean(\pm std.)$ obtained from 10-folds.$

Conclusions

This study outlined the creation of a radio frequency jamming detection testbed using the Wireless Spectral Scan dataset, detailing various technical considerations. We evaluated five machine learning classifiers for jamming detection, with the random forest classifier showing high accuracy. These findings serve two main purposes: they encourage further research in anti-jamming techniques and offer insights for future jamming dataset creation and testbed development. Our future endeavors will delve into more dataset use cases, like deep anomaly detection and reinforcement learning, aiming to elevate jamming detection techniques. Additionally, we will explore the possibility of extending the testbed to other types of signals using different hardware.

References is not available for this document.

RF Jamming Dataset: A Wireless Spectral Scan Approach for Malicious Interference Detection

Abstract:

Metadata

Abstract:

ISSN Information:

Introduction