Low Area and Low Power FPGA Implementation of a DBSCAN-Based RF Modulation Classifier

This paper presents a new low-area and low-power Field Programmable Gate Array (FPGA) implementation of a Radio Frequency (RF) modulation classifier based on the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, known as DBCLASS. The proposed architecture demonstrates a novel approach for the efficient hardware realisation of the DBSCAN algorithm by utilising parallelism, a bespoke sorting algorithm, and eliminating memory access. The design achieves 100% classification accuracy with lab-captured RF data above 8 dB signal-to-noise ratio(SNR) whilst exhibiting an improvement of latency in comparison to the next quickest design by a factor of 7.5, a reduction in terms of total FPGA resources used in comparison to the next smallest complete system by a factor of 3.65, and a reduction in power consumption over the next most efficient by a factor of 4.75. The proposed design is well suited for resource-constrained applications, such as mobile cognitive radios and spectrum monitoring systems.


I. INTRODUCTION
The ever-increasing demand for wireless communication has led to the emergence of numerous communication standards and the need for efficient spectrum utilisation.Identifying and classifying the modulation schemes of radio signals is critical for dynamic spectrum access, cognitive radio systems, and the development of beyond smart radio systems for 6G [1].Machine learning algorithms have proven to be effective in tackling such classification tasks.Among these algorithms, Convolutional neural networks(CNN) [2] and long-short-term memory (LSTM) [3] based systems have emerged as the most popular unsupervised learning method for detecting patterns in large datasets.While these models have shown strong performance [2], [3], their complex and generalized nature can be a limitation, particularly in mobile and low-power devices.Yingchun Wang et al. [5] detail the challenges with deploying deep learning systems in these scenarios.They conclude that to overcome the high power consumption and chip area requirements that machine learning models suffer from, en-gineers should either reduce model complexity or offload to the cloud for processing.Our work will attempt to solve this challenge by reducing complexity via introducing a bespoke clustering algorithm, specifically designed to address scenarios where CNNs and LSTMs fall short.The superiority of this approach is underscored by several critical factors: r Neural Networks such as CNNs and LSTMs can be resource-heavy, requiring significant memory and processing power.This can be a limiting factor, especially when deploying models to mobile devices [5].In contrast, our algorithm is optimized for energy efficiency, making it ideal for deployment in battery-operated or low-power devices.
r The streamlined design of our clustering algorithm al- lows for rapid data processing, resulting in lower latency compared to CNNs and LSTMs.This is particularly beneficial in applications requiring real-time data analysis, where the delay introduced by the computational complexity of CNNs and LSTMs can be prohibitive [5].
r Tailoring the algorithm to specific data scenarios not only enhances its efficiency but also reduces the computational overhead required for processing.This targeted approach allows the algorithm to bypass the extensive and often redundant calculations that CNNs and LSTMs perform, further contributing to lower power consumption and faster processing times [5].
In this work we propose a system called DBCLASS (Density-Based CLASSifier), based on the Density Based Spatial Clustering of Applications with Noise(DBSCAN) [4].
DBSCAN has not been applied thus far to tackle the problem of RF modulation classification.A traditional implementation of the algorithm in hardware would prove to be computationally slow due to its inherent sequential processing.FPGA implementations have the potential to address these challenges by exploiting parallelism and customization opportunities.This paper introduces a low area and low power FPGA custom implementation of DBSCAN that addresses these hardware limitations of the traditional algorithm.
The key achievements of this system are: r A low-power design that reduces the overall power con- sumption of the FPGA implementation by a factor of 4.75 in comparison to the next most efficient [15].
r A highly optimised pre-processing system based upon DBSCAN and a minimally complex artificial intelligence(AI) model together achieves a factor of 3.65 reduction in total FPGA resources used in comparison to the next smallest complete system [19].
r A pipelined architecture that is designed to work on real-time data-streams which achieves a reduction in latency of by a factor of 7.5 compared to the next quickest system [19].
r Competitive classification performance which matches the accuracy of more complex CNN architectures at SNRs above 8 dB.The remainder of this paper is organised as follows: Section II provides background on other work in this area from the literature.Section III gives an overview of the DBSCAN algorithm and its application to RF modulation classification.Section IV describes the proposed FPGA-based architecture in detail.Sections V and VI present and discuss the experimental results, and Section VII concludes the paper with a summary of the contributions.

II. RELATED WORKS
In the literature a number of approaches to modulation classification have shown their effectiveness in software.These can roughly be divided into three schools of thought, statistical wave feature extraction, automatic time series classification, and constellation diagram classification.

A. STATISTICAL CLASSIFICATION
The first of these approaches takes samples of waves from an incoming waveform and statistically determines features about the sample; examples of features which are used can be found in the article by A.K. Nandi and E.E.Azzouz [6].
Notable features include the kurtosis, entropy, standard deviation, skewness, and symmetry of a wave.A.K. Nandi and E.E.Azzouz create a system using these features and at 15 dB SNR the system correctly identifies Amplitude Shift Keying(ASK) and Frequency Shift Keying(FSK) at a minimum of 97% of the time, at 20 dB the results are 100% accurate.Boutte et al. [7] apply the same approach to modern modulation schemes, this approach combined with a Support Vector Machines (SVM) network is shown to be capable of achieving close to 100% accurate classification of Quadrature Phase Shift Keying(QPSK) above an SNR of 6 dB, as well as this Orthogonal Frequency Division Multiplexing(OFDM) BPSK is classified with 95% accuracy above an SNR of 15 dB.Both of these papers use a limited set of modulation types, a clearer picture of performance across a larger set of modulation schemes can be found in the work by D. Saharia et al. [8].In this paper a large set of results is presented with a confusion matrix of 11 different schemes.While some modulation types are classified above 90% accuracy at an SNR of 16 dB it is clear that the technique struggles to deal with such a vast array of schemes, this is especially clear when differentiating between similar modulation types based upon Quadrature Amplitude Modulation(QAM), 16QAM and 64QAM.When attempting to differentiate between these two similar waves they are classified as each other at almost the same rate as themselves.Other modulation types are classified on average with a 70% accuracy.A downside to this approach is the intense pre-processing of signals which is required before classification can be performed, this will lead to a delay in obtaining a classification result as well as increase the size and complexity of any hardware implementation.Due to the poor performance at low SNRs and low throughput this approach has been supplanted by more modern approaches.

B. DIRECT WAVEFORM CLASSIFICATION
Rajendran et al. in [9] use a LSTM to automatically classify RF waveforms, this achieves two notable improvements over the statistical feature methods.Firstly the model exhibits an enormous improvement in classification accuracy, across exactly the same modulation types as in [8] there is an improvement in all but one.The system classifies most schemes with an accuracy of at least 90% at 0 dB SNR, there still exists some misclassification of similar waveforms such as 16QAM and 64QAM but the accuracy remains at 85% and above, a marked improvement over the 52% accuracy with high SNR data with the statistical feature classifiers.The second advantage of using the LSTM is that the model directly uses the incoming RF waveform, thereby avoiding the need for pre-processing, however these gains are mitigated due to requiring a larger sample of data for classification and the LSTM structure being larger than most model structures in general.This paper also gives a comparison of classification accuracy of various model structures across a range of SNRs.The LSTM is shown to be vastly superior to most model structures, achieving an average of 90% accuracy above SNRs of 0 dB, only the CNN comes close in terms of performance by achieving an accuracy of 80%.Similar results are obtained by Ke et al. in [10], a LSTM model is shown to have the greatest average accuracy across all SNRs with 90%, a confusion matrix of the same collection of modulation schemes shows strong differentiation between each type, however the reduced accuracy with similar waveforms remains.LSTM based models have therefore shown strong robustness to noise yet are unable to reach perfect classification accuracy of 100% at any SNR.

C. CONSTELLATION CLASSIFICATION
The final approach to modulation classification is to classify the data based upon the appearance of the constellation diagram.There are a few ways of approaching this problem, the first of which is to create images of the constellation diagram and use an image-recognition CNN to classify constellations based on learned appearances.This method is shown by Doan et al. [11], at an SNR of 5 dB and above the model correctly classifies all schemes with a 100% accuracy, by far the best performance at this low SNR.However there are drawbacks when using techniques such as this.Firstly the CNN, and especially image recognition CNNs will have a large implementation size on a FPGA.Secondly not only is the CNN structure large but an entire pre-processing system must be implemented to prepare the images, adding further complexity and resource utilisation.Finally, not only are large batches of data required for the creation of the constellation image but creating the image itself will add a significant delay to obtaining a classification result.So this technique of classification is capable of achieving the 100% accuracy but at the cost of requiring more pre-processing and a large deep learning model.
Yu Wang et al. [12] use a CNN to perform convolution on constellation diagrams to calculate the data densities, this is then used to train a second CNN model.The work again achieves 100% classification accuracy above an SNR of 5 dB and is capable of 90% accuracy at 0 dB.The key idea in this work is rather than treating the constellation as an image, the data is represented numerically and the densities of the data points are used for classification.Yet this work requires multiple CNNs connected in series and parallel, the input CNN will determine the broad modulation type such as M-PSK or M-QAM and then the data will pass to the model which is trained to differentiate between orders of modulation.This work shows that using data density for classification can allow for strong performance yet the complexity of the system makes it unsuitable for low-area and low-power embedded systems.
While software models have shown greater accuracy at low SNRs than FPGA models, owing to their use of floating point precision, FPGAs have the advantage of reduced delay and power consumption [2], [15], [16].Thus FPGA and application-specific integrated circuit(ASIC) solutions are the optimal choice for low-power, low-area, and low-delay modulation classifiers in embedded systems.

D. HARDWARE COMPARISONS
The majority of hardware implementations found in the literature are based upon the CNN.This is to be expected as the CNN has shown the best accuracy in simulations [11], [12].Just as in software the approaches can also broadly be characterised into either time-series or constellation demodulation.The papers which exhibit the highest classification accuracy are the ModNet system by Kumar et al. [22] and HistoSVM by Cardoso et al. [23], these works both achieve 100% classification accuracy above an SNR of 9 dB, at which point the accuracy of HistoSVM begins to decline and reaches 74% accuracy at 0 dB.The 100% classification accuracy of ModNet is maintained until 4 dB, below this SNR the performance degrades until 86% accuracy is reached at 0 dB.ModNet follows a similar approach to Doan [11] and creates images of constellations which a CNN classifies.HistoSVM introduces a wholly unique approach and creates histograms which are used in conjunction with a Support Vector Machines(SVM) classifier.The best performing time series hardware model is ResNet by J O'Shea et al. [25].In this work the authors use a modified CNN known as a residual neural network and achieve an overall 96% accuracy, this performance is maintained until 10 dB SNR, although the authors do demonstrate that low order modulation classification accuracy reaches 100% accuracy.It is worth noting that the trend of lower order modulation scheme classification achieving higher classification accuracy is consistent across many papers [18], [22], [24], [25].ResNet and ModNet are therefore the best performing examples of waveform and constellation classification in hardware respectively, Out of these three best performing systems only HistoSVM provides data for the characteristics of the FPGA implementation, making a hardware comparison between each model difficult.A table of resource utilization of various designs can be found in Table 3 in Section VI, additionally Fig. 11 in the same section shows a comparison graph of accuracy against SNR.HistoSVM uses by far the least registers compared to other work, the majority of other designs are based upon the CNN and use tens to hundreds of thousands of registers.Conversely, HistoSVM uses an enormous amount of BRAM, the largest of any found in the literature, the latency of this work again is the largest which can be found.So while HistoSVM achieves 100% accuracy it comes at a cost of memory usage and latency.RUNet [19] again by Kumar et al. uses a similar residual neural network to ResNet and achieves very similar accuracy.This model uses the least registers, Look-Up-Tables(LUTs), Digital-signal-processors (DSP), and RAM of any deep learning based model bar Zhao et al. [2] which requires less registers and LUTs.Additionally RUNet has the least latency of any deep learning based system at 7.5 μs, narrowly beating S. Tridgell et al. [14], [16] by 0.5 μs.
In terms of area utilization and delay, RUNet [19] is the state-of-the-art in terms of implementation size, delay, and accuracy.The lowest power design found is that of Amad et al. [15] which uses 847 mW.Through efficient preprocessing in conjunction with a minimally complex machine learning classifier, similar to His-toSVM's approach, there is a possibility of creating a system that improves upon all work in terms of area, power consumption, and delay.The following sections will discuss the methodology in creating this system.

III. PROPOSED METHOD
In this work a new method of classification which minimises preprocessing, and does not require the use of a complex neural network model to achieve 100% accuracy is presented.The idea is to exploit the characteristics of the constellation diagram, which is essentially a set of clusters of points in 2D space, ideal for the application of a clustering algorithm.Most clustering algorithms such as K-Nearest Neighbours(KNN) will group points into a specified number of clusters [26], whereas the problem of this work is to solve the inverse.There are well defined clusters, if the number of them could be determined as well as their relative positions on the diagram, a minimally complex network could classify them based upon this information as each modulation type will have a unique number and arrangement of constellations.The clustering algorithm DBSCAN is suitable for this problem as it forms an arbitrary number of clusters, without a user specified parameter.This work will propose a novel method of using DBSCAN to extract the information about the clusters directly and use this information to achieve classification.
Time-series RF waves are decomposed and represented as two waves known as In-Phase(I) and Quadrature(Q) which respectively correspond to the instantaneous amplitude and phase of the original wave.The IQ point pairs can then be plotted in 2D space as a complex number Z. Modulation schemes which utilise changes in phase and amplitude will exhibit different clusters of points throughout the 2D plane as the I and Q values change to represent different data symbols, this forms a particular pattern known as the constellation diagram.A simple example of how this system will operate is by examining the examples of QPSK and 8PSK.Both of these modulation schemes can be distinguished as a human by recognising that the diagram with 4 constellations must represent the QPSK and likewise the 8 constellations the 8PSK.Similarly, the same process can be done with a computer through clustering in order to determine the number of constellations, therefore differentiating between QPSK and 8PSK.Not all modulation types can be differentiated by the number of constellations, for example 16PSK and 16QAM both have 16 constellations but it is the positioning of the constellations which can be used to separate them.To achieve this a proxy for determining positioning is to calculate the absolute value and arguments of each constellation, an example is shown below in Fig. 1.The calculated absolute values and arguments can be clustered to sort them into groups.Once the clustering is finished, a final result is obtained which is the number of different arguments and absolute values of the constellations, with this data the modulation scheme can be determined with a machine learning classifier trained on similar data.In addition to the argument and absolute value data allowing for stronger differentiation between like constellations, the 1 dimensional nature of the data allows for a unidimensional DBSCAN to be executed on each set of data, which facilitates further efficiency gains which are outlined in Section IV-B.

A. DBSCAN
A diagram of the operation of DBSCAN can be found in Fig. 2. Two different parameters are required to achieve accurate clustering with DBSCAN.These parameters are the minimum number of spatially near points to constitute a cluster (minPts), and minimum distance between two points to be considered part of the same cluster ε.DBSCAN has a worst case computational complexity of O(n 2 ) owing to the process of checking the distance to each point in the dataset from each point in the dataset.When working with 1 dimensional data as in this case, it is advantageous to sort the data and apply a modified algorithm.An example of unsorted and sorted data can be seen in Figs. 3 and 4 respectively.By sorting the data only the distance to the next point in the array needs to be calculated to determine if the next point belongs to the same cluster.This results in a computational complexity reduction of O(n 2 ) to O(n).A graph of the speedup difference in software can be found in Fig. 5.

B. CLASSIFIER
The machine learning classifier was trained using the number of absolute value and argument clusters, which is output from DBSCAN.Testing of suitable model structures was performed using MATLAB R2021b.It was found that the data showed good separation and therefore a small 4 node hidden layer 4 node output layer Multilayer Perceptron (MLP) achieved as strong performance, a more complex models such as a CNN or RNN would lead to an unnecessary increase in FPGA utilization and power consumption.Its structure can be found in Fig. 6.Training was performed with data obtained from applying DBSCAN on arguments and absolute values of RF data, it was standardised between ±127 to mimic the 8-bit data in the implementation scenario, 5-fold cross validation and regularisation was employed to reduce overfitting.

C. DATA
The data capture setup can be seen in Figs.7 and 8 which show a picture of the laboratory setup and its corresponding block diagram.All data used for testing of the system and training of the MLP classifier was generated using the Rohde & Schwarz SMW100A [20] and captured with a Keysight N9030B PXA signal analyser [21], waves modulated with BPSK, QPSK, 8PSK, and 16QAM were created at SNRs which ranged from 30 dB to 3 dB.The signal analyser was configured to the same carrier frequency as the signal source but was not in carrier phase lock.Additional Gaussian noise was added to the 3 dB signals to generate 0 dB and −5 dB sets of data.RF samples of two frequencies were captured, 73 GHz and 28 GHz, in both cases the data rate was 50 Msymbols/s.The spectrum analyser sampled data at 200Msamples/s, with a 160 MHz intermediate frequency Bandwidth, and a 100µs capture duration.The 73 GHz horn antennas used were Eravant SAZ-2410-12-S1 with a gain of 24 dBi and the 28 GHz horn antennas were Quasar QWH21SB-URB-K-F-20 with a gain of 20 dBi, the horn antennas are represented as the triangles in Fig. 8. Data was radiated at a proximity of 6 cm between horn antennas.Our data can be downloaded from Github at https://github.com/billjgavin/28_and_75GHz_Capture_Files.

IV. HARDWARE IMPLEMENTATION
Following the confirmation of the performance of the system in a software simulation the process of implementing the algorithm in hardware began.The primary focus of the hardware implementation was to create a system which was capable of classifying real-time streams of RF data while maintaining the performance achieved in software simulations.The implementation is fully pipelined and designed in such a way that each module can operate continuously.A system diagram of the full algorithm can be seen below in Fig. 9.
The algorithm is split into 4 constituent blocks: The absolute and argument LUTs calculate the absolute values and arguments of the complex IQ pairs which represent the RF message.These values are split into two datapaths which operate simultaneously, the operations performed in each datapath are identical.The first step of the split data paths is a custom built sorting module which sorts data in real-time as it enters the system.Following this, the sorted data flows byte by byte into a custom DBSCAN module, a further explanation of these systems can be found in Section IV-B.The final block recombines both datapaths in an MLP classifier which outputs the predicted modulation scheme.The implementation of the design was written in Verilog but the place and routing of the implementation was handled by the Vivado 2021.2 tools.The implementation strategy was set to find the implementation with the strongest performance with the command performance_explore.Otherwise all settings remained in their default state.

A. ABSOLUTE AND ARGUMENT BLOCKS
Finding the absolute value and argument of a complex number can be done with (1a) and (1a).arg = tan −1 q i (1a) Each of these equations require operations which are computationally slow to perform in hardware, finding the argument requires a division and an arctan, the absolute value requires multiple multiplications and a square root.The goal of this design is to handle a real time datastream, performing these operations would require too many clock cycles to facilitate this.Instead, a set of outputs for every combination of 8-bit I and Q inputs are precomputed.This required two large LUTs with 65536 entries each which used a significant amount of the available LUT slices on the FPGA.Despite this, performing the calculations in this way reduced the complex operations to a single clock cycle, enabling the rest of the design to function in real-time.Additionally, normalisation calculations were included in the output of the LUTs which eliminated a required step in the system, saving both time and resources.Incident data passed from the LUTs and into the sorting block.

B. SORTING AND DBSCAN
In this work a custom DBSCAN algorithm is employed which exploits the 1 dimensional nature of the absolute and argument data.This is achieved by pre-sorting data before the DBSCAN algorithm is applied.This sorting step allows for the minimum value to the next largest point ε to be calculated by simply taking the difference between point N and point N+1 in the data array, rather than taking the difference between point N and all other unclustered points.Overall algorithmic complexity is reduced from the traditional O(n 2 ) for DBSCAN to the complexity of the sorting algorithm.
Further gains can be made to the calculation speed by sorting data as it enters the system.As shown in Fig. 10, an array of comparators lie between the input and an array of shift registers.An input datum X is compared to the currently held values in the shift register array, all previously stored data points are compared with the incoming datum and all stored data that is smaller than the new datum are shifted downwards, the new datum is placed into the empty register, between the values which are immediately larger and smaller than it.This method of sorting achieves an effective sorting time of 0 as by the time the final point of the sample for the DBSCAN operation enters the system the data is already sorted and can move on into the DBSCAN block, the sorting system can then begin sorting the next set of incoming data.
A major consideration of the DBSCAN algorithm is the values of the ε and MinPts hyper parameters.Optimal ε values vary between datasets and can have a large impact on classification performance.For instance, a choice of ε which is too high can allow outlier noise points to 'bridge' the gap between two constellation clusters which makes the algorithm combine the two clusters into one.Conversely, a ε which is too low can cause a single cluster to be counted as multiple or none at all.A case where this can cause an issue is that different SNR values introduce different values of separation between points as well as constellations themselves, meaning that an optimal ε value for 20 dB data will not be optimal for 5 dB.To counter this, the output of the absolute value and argument LUTs were scaled to between +-127 for all input values, this normalisation allowed a ε value of 5 to work optimally for all SNRs.
Similarly, the minPts optimal value can differ depending on the number of samples used per classification, the number of constellations expected in a modulation scheme, the ratio between these two values, and finally the SNR of the signal.Choosing too high of a minPts value leads to clusters potentially not being found, too low of a value can lead to randomly occurring noise clusters being treated as constellations.In testing the value of this hyperparameter was found to be less important than ε.As a small sample size of 50 datapoints was used to reduce latency and implementation size, it was found that noisy points were very unlikely to be classified as an extra constellation and minPts could be kept to small values such as 2 or 3.
DBSCAN is implemented as in Fig. 11.An algorithmic representation can be seen in Algorithm 1. Data is input serially from the sorting block, incident point N-1 is subtracted from the previous point N.The difference is compared with ε, should the difference be smaller than ε the point counter Out put ← ClusterCount will increment, if not the point counter resets.When the point counter resets, its value is compared with minPts, if the count of points in the cluster is greater than minPts then the cluster count the will increment, otherwise the count remains the same.The system output is the cluster count after 50 operations.This combination of real-time sorting and modified DB-SCAN achieves an algorithmic complexity of O(n) and allows for complete pipelining of the preprocessing system.This can be seen clearly in Algorithm 1, the DBSCAN algorithm has been reduced to 50 loops or 50 clock cycles.As soon as the sorting process completes the data is serially output and the empty registers filled with a new set of data.The time taken from when the first datum enters the system to achieving a DBSCAN result is 2N clock cycles, where N is the number of datapoints chosen for the DBSCAN calculation.This also achieves a significant reduction in implementation size and power consumption as the algorithm is reduced to a subtraction, 2 comparisons, and 2 counters.

C. MLP
The outputs of the DBSCAN algorithm enter the final stage of the system which is the MLP classifier.The MLP takes the number of different argument and absolute value clusters and predicts the modulation scheme.The number of nodes in the MLP is as follows: 2 in the input layer, 4 in the single hidden layer, and 4 at the output for the 4 different modulation schemes used in training, each output node is followed by a logistic outfunction that is calculated with a LUT.The largest value of the output nodes is taken as the classification result.Training of the MLP was performed off the FPGA in software using MATLAB, the weights and biases were exported from MATLAB and stored on the FPGA in ROM.The MLP features a 64-bit 2's complement datapath with a fixed point set  after the 32nd bit.A datapath of at least this size was found to be a requirement to maintain the expected performance as it eliminated overflow issues, but more importantly, the precision of the weights and intermediate values needed to be as similar as possible to those in the software simulation.Weights and biases were stored to 16-bit precision.The output takes the 8 most significant bits of the 64-bit datapath results.Training of the model in software was performed using 30000 data points per modulation scheme per SNR value, totalling 720000 data points split into samples of 50, and therefore 14400 overall samples.

V. RESULTS
This section presents the accuracy and FPGA implementation characteristics.Section V-A provides the accuracy of the system across a range of SNRs, in Section V-B an overview of the hardware is found.

A. ACCURACY
The FPGA implementation of the proposed RF classifier was developed and evaluated using a Xilinx Zedboard.Fig. 12 presents the classification accuracy of the implemented RF classifier as a function of SNR.It can be observed that the classifier achieves 100% accuracy for all SNRs above 8 dB.At SNRs below 8 dB, the classification accuracy of 8PSK and 16QAM modulation schemes degrades severely, this is primarily due to the increasing effect of noise causing constellations to begin to overlap, the majority of 8PSK and 16QAM signals which were incorrectly classified were predicted to be QPSK signals.At 5 dB QPSK classification accuracy begins to decrease, likewise after 0 dB BPSK performance degrades.At -5 dB the accuracy of QPSK, 8PSK, and 16QAM becomes no better than a random guess while the performance of BPSK classification drops to 75%.Fig. 13 displays the classification accuracy against SNR for orders of QAM from 4 to 256.These results are obtained from using MATLAB generated waveforms and were included to illustrate how the performance of this system degrades as modulation complexity increases.The graph shows that the classification accuracy decreases as modulation order increases.From the graph it can be seen that the 4, 8, and 16QAM curves are similar but slightly less accurate than the results found for QPSK, 8PSK, and 16QAM in Fig. 12.This is attributed to the values of the ε and minPts hyperparameters being slightly varied to 3 and 2 respectively for this test.This was required to tune the system for the higher order modulated data but came at a cost of slightly worse performance for the low order modulated data.The 32QAM curve shows the system has the ability to recognize and classify this modulation scheme with the accuracy starting at 96% at 30 dB SNR.As the SNR decreases the 32QAM curve follows a similar trend to that of the lower order modulation's curves but reaches 14% accuracy at 5 dB rather than the −5 of that of 4, 8, and 16QAM, for these tests 14% is taken as being no better than a random guess between 7 classes.64, 128, and 256QAM begin with strong classification accuracy at 30 dB SNR but performance quickly degrades as SNR decreases.Beyond this trend there is no other particular trend that can be observed from the three highest order modulated data's curves, the lines overlap and the strongest performer varies across SNRs.The weak performance shown by these curves is explained by the clustering system's inability to handle the densely spaced constellation diagrams of these modulation schemes, even at 30 dB there is overlap between constellations, at 20 dB and lower there is so much overlap that accurate clustering becomes difficult.Figs. 15 and 16 show the classification accuracy of each modulation scheme used in this work at 8 dB and −5 dB SNR. 8 dB is the lowest SNR at which 100% accuracy is achieved by the classifier and as can be seen in Fig. 14 each sample is correctly classified.Fig. 15 shows the classification accuracy at −5 dB, the system only correctly classifies each sample 25% of the time, as can be seen from the number of blue and red matrix elements, which is equal to a random guess, meaning that the system ceases to function at all at this SNR, apart from for BPSK which still maintains 82% accuracy.
This work has also shown to be carrier-frequency-offset (CFO) resistant, the lab recorded datasets featured significant CFO and there was no reduction in performance detected in comparison to the MATLAB generated data.This is primarily due to the system operating on small batches of data, so as long as CFO is not significant enough to cause distortion within a 50 sample window, the effect of CFO is negligible.

B. HARDWARE PERFORMANCE
In this section, the results of the FPGA implementation of the machine learning classifier using a ZedBoard with a Zynq-7000 SoC XC7Z020-CLG484-1 are presented.For testing a No BRAM usage is required for this system.The detailed resource utilization is summarized in Table 1.
The power consumption values for various components of the implemented classifier are summarized in Table 2.As shown in Table 2, the total power consumption of the implemented classifier is 1,704 mW.The power consumption is primarily dominated by the processor, consuming 1,526 mW.The other components, such as clocks, signals, logic, DSP, and static, exhibit a dynamic power consumption of 39 mW and a static power consumption of 139 mW.

VI. RESULTS COMPARISON
In this section the results from testing are compared to the state-of-the-art examples from the literature.Section VI-A begins by contextualising the hardware utilization.Section VI-B compares the accuracy of the system.

A. HARDWARE COMPARISON
Table 3 displays a comparison of the state of the art RF classifier implementations.In terms of total FPGA resources used there is no system of comparable size and efficiency to DBCLASS, column 5 shows that this work achieves a 3.65 times reduction in total resources used compared to the next smallest.Furthermore, this system uses the second least number of registers (although the design with the least number of registers has a non-traditional structure which mainly utilizes RAM [23]).Against the traditional CNN designs this work exhibits a 6.9 times reduction in registers required by the next smallest.Similarly, the number of LUT elements required also show a 2.6 times reduction to RUNET [19].The lack maintain perfect accuracy for the more densely spaced 8PSK and 16QAM modulation schemes.Although the DBSCAN system matches the trends seen in both ResNet and RUNet, it consistently achieves greater classification accuracy at all SNRs.At 0 dB the autoencoder accuracy is greater than the DBSCAN model in this work, but the performance remains comparable.Although this work is shown to have a greater accuracy than [19], [25], these two papers feature tests on high order modulation schemes.
A more apt comparison may be with the testing of this work on generated data which includes higher order modulated signals.However this comparison is still not 1-to-1 as the modulations of 64, 128, and 256QAM constitute 42% of this work's average and 21% of the average of [19] and [25].We made the choice to not include the amplitude and frequency modulated schemes which feature in [19], [25], as DBCLASS had been designed to work on QAM and PSK only, this is due to the fact that these are the modulation types that modern communication systems predominantly use [1].The accuracy curve for our work's data tests across all SNRs is lower than that of other work found in the literature.This is primarily due to the degradation of performance at very high modulation orders.Figs. 12 and 13 show that the performance on low order modulated data remains comparable regardless of whether the data is recorded or simulated owing to the similar curves across modulation orders included in both tests.Therefore DBCLASS remains competitive in terms of accuracy on low order modulated data but performance decreases at higher orders.

C. COMPARISON SUMMARY
The hardware comparisons discussed in Section VI-A conclusively show that this work is the smallest, quickest, and most efficient system for automatic modulation classification.The accuracy comparisons of Section VI-B demonstrated that when working with M-PSK and M-QAM modulation schemes where M is less than or equal to 16, the DBSCAN system of this work is competitive in terms of classification accuracy.When M is above 16 performance is seen to deteriorate.Therefore it can be concluded that this DBSCAN based modulation classifier is the optimal choice for a low-power, low-area, low-latency design working on low order modulated data.

VII. CONCLUSION
The paper presents a novel FPGA-based implementation of a machine learning classifier for RF modulation classification.An introduction to, and comparison of, the state of the art is presented and clustering is proposed as an improved method to achieve classification, DBSCAN was identified as the ideal algorithm.Additional optimisations to the DBSCAN algorithm lead to large improvements in the delay, size, and power consumption of the system.The latency was found to be 7.5 times lower than the next fastest work [19].Similarly the design consumed 4.75 times less power than the most efficient system in the literature [15].This work also required the second least number of registers [23], the second smallest number of LUTs [2] by 2.6 times, the second least number of DSP slices [19], and no RAM.The DBCLASS was found to have the smallest implementation by 3.65 times on aggregate in comparison to the next smallest work in the literature.Thus, to the best of the authors knowledge, this work's design has been shown to be the smallest, fastest, and most efficient, as well as being 100% accurate above 8 dB when using modulation schemes of orders below 16.The DBCLASS design is therefore the optimal choice for engineers working with low-power devices on real-time data-streams at noise levels above 8 dB.

FIGURE 2 .
FIGURE 2. Diagram of the operation of traditional DBSCAN.

FIGURE 5 .
FIGURE 5. Speed-up comparison of sorted 1D DBSCAN and traditional DBSCAN in MATLAB.

FIGURE 6 .
FIGURE 6. Diagram of the MLP structure.

FIGURE 7 .
FIGURE 7. Photograph of lab setup for data capture.

FIGURE 8 .
FIGURE 8. Block diagram of lab setup for data capture.

FIGURE 12 .
FIGURE 12. Graph of classification accuracy against SNR of recorded signal in dB.

FIGURE 13 .
FIGURE 13.Graph of classification accuracy against SNR of software generated signal in dB.

FIGURE 14 .
FIGURE 14. Graph of comparison of classification accuracy against SNR in dB of this work and the state-of-the-art using recorded data.