Minority Resampling Boosted Unsupervised Learning With Hyperdimensional Computing for Threat Detection at the Edge of Internet of Things

The Internet of Things (IoT) has rapidly transformed digital environments across a multitude of domains with increased connectivity and pervasive virtualization. The distributed computing paradigm of Edge Computing has been postulated to overcome the concerns of response time, bandwidth, energy consumption, and cybersecurity. In comparison to the other concerns, limited studies have focused on cybersecurity, mainly due to the inherent complexity of threat detection at the Edge. However, the widespread adoption of IoT applications in economic, social, and political contexts is a stringent indication of the significant impact from cyber-attacks. This paper aims to address this challenge by presenting an effective and efficient machine learning approach for threat detection at the Edge of IoT. The novel contributions of this approach are, a new Enhanced Geometric Synthetic Minority Oversampling Technique (EG-SMOTE) algorithm to resolve the imbalanced distribution of data streams at the IoT Edge, an extension to the Growing Self Organizing Map (GSOM) algorithm based on Hyperdimensional Computing for energy efficient machine learning from unlabeled data streams. The proposed EG-SMOTE + GSOM approach has been tested using four open access datasets; three benchmark, KDD99 (F-Score = 0.9360), NSL-KDD (F-Score = 0.9647), CICIDS2017 (F-Score = 0.9999), and one industry-focused botnet IoT traffic dataset, BoT-IoT (F-Score = 0.9445). The EG-SMOTE approach has outperformed SMOTE and G-SMOTE approaches in a vast number of experiments that are tried with different classifiers. The results of these experiments confirm the novelty, efficiency and effectiveness of this approach for cybersecurity at the IoT Edge.


I. INTRODUCTION
Edge Computing is primed to address the challenges of Internet of Things applications that are being developed and deployed in complex real-world settings [1]. The Internet of Things Edge (IoT Edge) has enabled computation and storage in end-user proximity, decreased transmission latency, and reduced network bandwidth requirements, leading to efficiencies in response time, resource utilization and end-user outcomes [2], [3]. This has been particularly significant for real-time IoT Edge applications in energy management, The associate editor coordinating the review of this manuscript and approving it for publication was Zhaoqing Pan . smart factories, and digital healthcare [4], [5]. Despite these advances, there has been limited research conducted on effective, efficient and secure machine learning at the Edge of IoT [6]. Furthermore, a recent taxonomic analysis highlighted the importance of a trust ecosystem for cybersecurity in order to improve the uptake and proliferation of IoT Edge applications [7].
The IoT is a key enabling technology in Industry 4.0. Cybersecurity threats and attacks on IoT Edge applications have been categorized into three layers, perception, transportation and application [7], [8], and further studied in terms of class of attacks, key-related, denial of service, replay and privacy attacks [9]. A cybersecurity attack on an IoT Edge application is a breach on the integrity of its structure, function, and operations, impacting both cyber and physical elements [10]. The types of cybersecurity attacks are diverse in terms of exploits, targets, methodologies, and the technical mechanism. These attacks aim to prevent the legitimate use of a service, compromise a user's security and privacy, interrupt system security and data integrity, gain compromised grant permissions and engineer malicious activities using DDoS attacks, side-channel attacks, malware injection attacks, authentication and authorization attacks, manin-the-middle attacks, and bad-data injection attacks [11]. Encryption, key management and multi-factor authentication can be used to reduce these attacks [12], [13]. Each attack is usually intangible and can remain undetected for months, deteriorating the critical components of the IoT Edge. IoT-related vulnerabilities, if successfully exploited, can affect not only the device itself, but also the application field in which the IoT device operates [14]. Low computational capacities, protocol heterogeneities and coarse-grained access control [11], hardware and social engineering vulnerabilities [15] within an IOT Edge setting introduce further challenges for the detection of cyber threats and attacks. IDS (Intrusion Detection System) have been generally used to detect cyber-attacks in most application settings. IDS fall into two broad categories: signature-based and behaviourbased. Signature-based intrusion detection is based on pattern matching techniques to efficiently determine a known attack. Signature-based models require frequent updates with a new signature [15]. Behaviour-based intrusion detection, also known as anomaly detection, compares operational behavior profiles to detect attacks, based on deviations from profiles of normality. In IoT Edge, anomaly detection approaches are more effective than signature-based methods as most cyber physical attacks employ obfuscation techniques such as inserting no-ops, code re-ordering, register renaming, expanding and shrinking code, and the insertion of garbage code to bypass signature checks at databases [16], [17].
Current literature reports three types of approaches for anomaly detection. They are knowledge-based, statistical and machine learning approaches. Knowledge-based and statistical approaches are affected by the limitations of capturing, profiling and updating IoT Edge configurations at operation level in a dynamic computing environment, and the exposure of system vulnerabilities for behaviour profiling, whereas machine learning is able to address these limitations by managing the adaptive disposition and dynamic behavior of IoT Edge operations with high detection rates, low false positives and pragmatic computation and communication costs [18], [19]. More specifically, unsupervised machine learning methods are technically suited for the detection of behaviour-based cyber threat and attacks on IoT Edge as it can learn from unlabeled data [20]. In settings where machine learning is based on imbalanced datasets, the weakness of general learning algorithms contributes to the difficulties of classifying the anomalies as the algorithms generally bias towards the majority class samples. This limitation is more pronounced in IoT Edge where failing to account for minority data samples is consequential than the removal of such data due to underrepresentation.
Drawing on this context, we propose an effective, efficient and secure method for machine learning at the IoT Edge, specifically for cybersecurity threat detection. This method is effective as it addresses the challenge of high volume, high velocity unlabeled data streams generated at the IoT Edge. It is efficient as it conducts unsupervised machine learning using the Growing Self Organizing Map (GSOM) algorithm based on hyperdimensional computing, resulting in an energy-efficient computation and storage footprint. It is secure as it is boosted by minority resampling of imbalanced data generated by cybersecurity threats and attacks at the IoT Edge.
The following research contributions are reported in this paper.
• The development of a novel EG-SMOTE algorithm for resampling that addresses the limitations of synthesizing noisy minority samples, overfitting due to extreme synthesis of minority samples, and improper synthesis along the borderlines, specifically from imbalanced data streams in cybersecurity settings.
• A machine learning method that advances EG-SMOTE for unsupervised machine learning from unlabeled data in the IoT Edge. The unsupervised machine learning capability is based on the Growing Self Organizing Map (GSOM) algorithm that also utilizes Hyperdimensional Computing for energy-efficient machine learning.
• Empirical evaluation of the proposed approach using three benchmark datasets, KDD99, NSL-KDD, CICIDS2017, and an industry-focused botnet IoT traffic dataset, BoT-IoT that confirms the security, efficiency and effectiveness of the proposed machine learning approach at the IoT Edge. The rest of this paper is organized as follows; Section II presents related work on threat detection in the IoT Edge, sampling imbalanced data, machine learning for threat detection, and hyperdimensional computing for low energy computation implementations. Section III delineates the proposed approach, focusing on the development of the new EG-SMOTE algorithm and its incorporation into the GSOM algorithm for unsupervised classification. Section IV reports on empirical evaluation that confirms the validity and effectiveness of the proposed approach. The paper concludes with Section V.

II. RELATED WORK
Threat detection in the IoT Edge can be generalized as deviations from standard behaviour of processes and functions within an IoT application. The original formulation of such deviations can be traced back to anomalies as defined by Hawkins, 'an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism' [21]. Anomaly detection can be used to detect cyber threats on the IoT Edge such as side-channel attacks, denial of service attacks, malware injection attacks and authentication attacks. The main goal in anomaly detection is to define a precise boundary between normal and anomaly data [22]. Numerous machine learning and traditional statistical approaches have been proposed for anomaly detection, however, only a few of these have been adopted for cyber threat detection in IoT applications [23]. An architecture for edge-based security, building firewalls, intrusion detection systems, authentication protocols and privacy preserving methods have been proposed recently [24], [25].
When the number of training data is insufficient, the fewshot learning (FSL) approach can be used which reduces the task related training via the prior knowledge. This is a learning paradigm that aims to address concerns with a shortage of training data by allowing models to identify novel categories with only a few sample data. The main drawback is that FSL requires a balanced dataset to detect the anomalies for intrusion detection [26]. The IoT anomaly datasets are usually imbalanced, where one class is represented by a large number of cases and the other is represented by only a few cases. Granular computing is an important technique for identifying the optimal granularity under an imbalanced dataset. Granular Computing (GrC) has risen to prominence as a new multi-disciplinary paradigm in artificial intelligence and has attracted a lot of attention in recent years. Xu et al. [27] states that GrC can be used as an efficient means of data-preprocessing step which is employed in many machine learning approaches. Many applications of GrC in the field of machine learning have recently been described by Ye et al. [28], and many outlier detection techniques based on GrC have also been proposed. GrC can be combined with deep learning techniques to identify minority patterns from imbalanced data for service planning.
The blockchain-based systems could help to prevent counterfeiting of data by ensuring that the IoT systems have not been tampered with. These systems are employed to overcome the serious concerns regarding the security in manufacturing and product lifecycle management in industry 4.0 such as, blockchain-empowered sustainable manufacturing and product lifecycle management in industry 4.0; blockchain-secured smart manufacturing in industry 4.0; combining permissioned blockchain with a holistic optimization model as bi-level intelligence for smart manufacturing [29]- [31]. Features of blockchain technology can be leveraged to provide an Anomaly Detection Service. NOKIA Bell Labs, proposed the first solution, Blockchain Anomaly Detection (BAD) for detecting anomalies in blockchain-based systems. Much study has been done on blockchain-enabled sustainable manufacturing in Industry 4.0 from a technical, commercial, organizational, and operational standpoint [32].
Most of the anomaly detection methods need human interactions [6]. Drawing on this limitation, the following subsections explore recent work related to the machine learning approach proposed in this paper.

A. RESAMPLING IMBALANCED DATA
Oversampling is the process of replicating the minority class and undersampling is the deletion of repeating samples of the majority class [33]. Extreme oversampling leads to overfitting despite the preservation of useful information and features, while undersampling leads to underfitting and poor generalization. SMOTE is an oversampling technique, which synthesizes new minority data along the line segments joining randomly chosen minority samples [34] By generating examples similar to existing minority points, SMOTE creates larger and less specific decision limits, which increase the generalization skills of classifiers and thus increase performance. Han et al. [35] suggested that synthetic samples must be created upon samples closed to the boundary and borderline-SMOTE algorithm is based on the sample's selection strategy. This borderline-SMOTE categorizes the minority instances as noise, safe, and danger sets. The data points in danger sets are considered as the borderline instances, and they are oversampled similar to SMOTE. Douzas and Bacao [36] proposed G-SMOTE for generating synthetic samples in a geometric region of the input space, around each selected minority instance. The basic configuration of this geometric region can be a hypersphere or a hyper-spheroid.

B. MACHINE LEARNING FOR ANOMALY DETECTION
Machine learning has proven to be far more effective than knowledge-based and statistical techniques for anomaly detection [37]. Existing literature reports all three types of machine learning for anomaly detection, supervised, unsupervised, and semi-supervised [20]. Supervised learning for anomaly detection is affected by the need for pre-labelled data of normal and anomalous behaviors. This is specifically challenging in an Iot Edge setting, where the data is inherently unlabeled, and the volume of anomalies can be large and not easily accessible or available from vendors or other end-users due to concerns of exposing further vulnerabilities. Given this is a significant limitation, it is pertinent to focus specifically on unsupervised learning techniques for threat detection. Eskin et al. [38] evaluated clustering, k-NN as well as a one-class SVM using KDD-Cup99 dataset.
Techniques like autoencoders are trained on normal data and can be used to detect anomalies [39]. In contrast to these unsupervised learning approaches, the GSOM algorithm transforms high-dimensional data into low-dimensional data while preserving the underlying topology representation of the data [40]. The GSOM algorithm has also been used for clustering, classification and visualization based on this property of dimensionality reduction. GSOM has been successfully adapted for DoS attack detection [41], and activity detection [42].

C. RESAMPLING IMBALANCED DATA
Hyperdimensional (HD) computing is a bio-inspired computational approach for representing and manipulating concepts and their meanings in a high-dimensional space with a low computational overhead [43]. High dimensional binary vectors of fixed length are the basis for representing information in this type of computing, and the information in an HD vector is evenly distributed across the vector's positions, therefore, hyperdimensional computing operates with distributed representations [44]. These distributed representations contrast with localist representations as they can be used to perform low-resource computations on digital acceleration hardware such as FPGA units available on IoT and Edge devices. Recent work [45]- [47], successfully demonstrated the effectiveness of the adaptation of the GSOM algorithm based on HD computing for unsupervised learning from unlabeled data in low energy devices and settings. As delineated in the following section, the proposed machine learning method expands on this success.

III. THE PROPOSED APPROACH
We have designed the proposed machine learning approach in the context of the architecture for cloud-edge orchestration of IoT applications [48]- [50]. As illustrated in Fig. 1, it is positioned between the Cloud layer and the Edge layer as it receives data streams from IoT devices situated in both the Edge and Mobile Edge layers. This cooperative architecture across Cloud, Edge and Mobile Edge layers enables real-time responses for the detection of cyber threats and attacks. The proposed machine learning approach begins with the EG-SMOTE algorithm for resampling that generates balanced IoT data streams. Next, the adaptation of the GSOM algorithm based on HD computing performs unsupervised learning from unlabeled data, within the bounds of the computational constraints of the Edge layer. Finally, a classification module is used to identify anomalies and push these across as alerts and notifications to the Edge and Cloud layers.

A. THE DEVELOPMENT OF EG-SMOTE ALGORITHM
G-SMOTE extends the linear interpolation procedure by generating samples based on a geometric region [36]. This algorithm defines a geometric region around the specific sample inside which the new samples are synthesized.
3. Cluster the minority samples into an optimum number of sub-clusters (n). 4. Find the maximum number(ni) of synthetic samples that can be created by each cluster.

Number of minority points insub cluster Total number of minoritypoints
5. Find k-nearest points ∀ x (x ∈ S min ) L nearest -list of k-nearest neighbors of x L near_min -list of k-nearest minority neighbors of x P maj -nearest majority point of x 6. Execute the following steps until N number of synthetic points are generated.
• Shuffle the minority points • Randomly choose a minority point S i ∈ S min Modify α trunc Select a minority point P min (P min ∈ L near_min ) Ssurface = P min point_generation (S center , S surface ) Else Randomly select a minority point y (y ∈ L near_min ) S surface = argmin P min , P maj (|| S center -P min ||, ||S center -P maj ||) VOLUME 9, 2021 Step 1 deals with the initialization process of the algorithm, defining and assigning values for the known parameters. These parameter inputs are given in the algorithm and will be elaborated in the following steps.
Step 2 defines the number of samples to be synthesized.
Step 3 and 4 are novel steps introduced into the algorithm to segregate the sub clusters to which the resampling can be re-applied.
Step 5 and 6 are the critical part of the algorithm where the minority points are chosen at random and subjected to resampling based on the defined category. The above-mentioned steps are elaborated below.
Step 1: Truncation factor (α trunc ) and deformation factor (α def ) which were introduced in G-SMOTE [36] and Sampling rate (β), are initialized. Truncation factor corresponds to the transformation of the hyper-sphere into a hyper-spheroid. Similar to the truncation, the deformation transformation further modifies the initially uniform probability distribution. EG-SMOTE restricts the number of samples to be generated by sampling ratio β, in contrast to G-SMOTE which synthesizes new samples until the ratio of majority to minority becomes 1:1. Considering this 1:1 ratio of oversampling in G-SMOTE, the generation of humongous amounts of synthetic data will lead the models to learn unrealistic synthetic knowledge that do not exist in datasets. Therefore, EG-SMOTE restricts the rate of oversampling to reduce excessive synthesizing of minority samples. α trunc and α def are initialized with an initial value and later modified depending on the category of the selected minority sample (further explained in Step 6). In contrast, G-SMOTE operates with a static value for both factors.
Step 2: As mentioned above, the number of samples to be generated (N) is calculated using (1) based on the sampling ratio (β). Generally, N is calculated as the difference between the number of minority and majority samples which leads to the minority points to be synthesized in abundance such that minority, majority ratio be 1:1. This abundant creation of synthetic samples may lead to overfitting. Therefore, a parameter called sampling ratio (β) is introduced to limit the number of points to be generated.
Step 3 and 4: Minority samples are clustered into an optimum number of sub-clusters (n) to generalize the generation of new samples across all regions, which reduces the effect of overfitting. Since we employ an approach which allows resampling based on sub-clusters, sub-cluster resampling contributing rate γ (0 < γ < 1) is given a value such that the contribution from each sub-clusters in synthesizing new samples is constrained by an upper limit as presented in (2).
The k-means clustering algorithm is used to cluster the minority samples after obtaining optimum value for k from the 'Elbow Test' [51]. The number n i limits the contribution to N from each sub-cluster to prevent overfitting and allow the synthesis of minority points from every other category.
Step 5: k-nearest points are identified for each minority sample in three categories: k-nearest points from minority samples, k-nearest points from both minority and majority samples and a nearest majority point.
Step 6: EG-SMOTE categorizes the selected minority point based on the k-nearest points and chooses the surface point based on the category where G-SMOTE fails to do so. Based on the ratio of (majority:minority) in k-nearest neighbors, minority samples are categorized. Consider m being the no. of majority samples in k-nearest neighbors.
• If m = k, it is an absolute noisy sample • If m >= 3k/4, it is a noisy sample • Both these samples are considered noisy algorithms. Borderline SMOTE has considered only the first case as noisy leaving the second as borderline sample [35]. But since there are possibilities for a miniature noisy cluster, with one to two minority points, EG-SMOTE refrains from treating those as borderlines and rather treat those as noisy. By this EG-SMOTE intends to prevent creation of more noisy samples.
• If m = 0, it is an absolute safe sample • If m <= k/4, it is a safe sample • EG-SMOTE reduces the threshold for safeness as opposed to Borderline SMOTE [35]. Borderline SMOTE never synthesizes new samples for safe zone data, which introduces an intra-cluster imbalance. EG-SMOTE algorithm addresses this intra-cluster imbalance, by synthesizing new samples for safe zone samples. But the algorithm shrinks the threshold for safe samples since extensive synthesis of minority data will lead to overfitting of data [52].
• EG-SMOTE identifies a sample as borderline only when k/4 < m < 3k/4. There is a necessity of addressing the inherent nature of minority samples. Hence, they are categorized accordingly, provided with different hyper-sphere selection phase, and point generation phase specific to that category. The point generation phase differs from one category to another by assigning different values for α trunc and α def . Hyper-sphere is pruned as per the category, and a new minority sample is synthesized. The method synthesis_sample for generating points follow similar steps of G-SMOTE. Points generation based on the above-mentioned categories is elaborated as follows.
There is a necessity of addressing the inherent nature of minority samples. Hence, they are categorized accordingly, provided with different hyper-sphere selection phase, and point generation phase specific to that category. The point generation phase differs from one category to another by assigning different values for α trunc and α def . Hyper-sphere is pruned as per the category, and a new minority sample is synthesized. The method synthesis_sample for generating points follows similar steps of G-SMOTE. Points generation based on the above-mentioned categories is elaborated as follows.

1) NOISY SAMPLES
A minority sample is noisy when all the k-nearest neighbors are majority samples or when the majority is the most frequent (>80%) in the k-nearest neighbors. EG-SMOTE prohibits synthesizing new samples for noisy samples. G-SMOTE has identified the problem correctly, however, the algorithm does not prevent incorporating new samples from noisy minority samples. As a result, G-SMOTE has the potential to end up in synthesizing new instances for noisy samples, as depicted in Fig.2. Consider a scenario where all the k-nearest points belong to the majority, G-SMOTE selects the nearest majority sample as the surface point and tries to synthesize new minority instances in the hyper-sphere. Hence primarily, the G-SMOTE algorithm is enhanced to prevent the integration of further noisy minority samples.

2) BORDERLINE SAMPLES
Borderline minority samples occur when the existence of minority data in the k-nearest neighbors is above 40% but less than 80% as discussed above. These borderline samples are often located in overlapping regions of minority and majority classes or placed close to the complex decision boundaries between the types. It is essential to define a safe zone for point generation in the borderline of the minority and majority data clusters. This issue is proposed and handled by G-SMOTE correctly. However, G-SMOTE has a static truncation and deformation factor. The deformation factor deals with a plane of synthesis point where truncation deals with the pruning of the sphere to define a safe zone for point generation [36]. The negative values for the truncation factor would prune the same side of the selected surface point and vice versa. Consider a situation similar to Fig.3, where the surface point is a minority sample. G-SMOTE, with its truncation factor (−1 <= α trunc <= 1) being a static value (consider truncation factor to be 1.0), the algorithm prunes the same side of the selected surface point for both instances. This approach would be successful in some instances like when the surface point is a majority sample at the same time leading to synthesizing noisy samples when the minority point is selected to be the surface point. Based on the empirical evaluation, it was decided to define the truncation factor based on the point chosen as the surface point (either majority or minority) to reduce the impact of synthesizing minority data in the clusters of majority cluster space. Fig.3 shows the generation of new instances in the borderline between both binary clusters. With respect to EG-SMOTE, for a minority point, the truncation (α trunc ) factor is assigned to a value lesser than zero when the selected surface point belongs to the majority class such that pruning is done on the opposite side. If the surface point is the majority point, α trunc will be assigned with a negative value from −1 to 0. Similarly, a positive value will be assigned when a minority sample is selected as the surface point where the opposite side will be pruned.
Safe zone sample is when almost all the k-nearest neighbors are minority samples. Borderline SMOTE [35] claims that sampling minority data leads to overfitting and discourages subcluster wise oversampling. G-SMOTE has evangelized about data synthesizing in vast spread areas or VOLUME 9, 2021 synthesizing data of different minority samples all over. This method is efficient in addressing the issues of imbalanced binary classification. As Bartosz Krawczyk addressed in [53], each minority sample should be considered for this interpretation. However, G-SMOTE has not considered the effect of over-fitting (Fig.4).  G-SMOTE arbitrarily allows minority points to be synthesized in abundance such that minority, majority ratio be 1:1. When considering a larger dataset, the number of synthetic points is more significant, leading to overfitting. Hence in the EG-SMOTE algorithm, an upper limit was set for sampling every other minority sub-clusters, as expressed in (2). The algorithm sets up a maximum value for several synthetic samples per each subcluster, where the number of subclusters is decided after Elbow testing. In cluster analysis, the elbow method is a heuristic used in determining the number of clusters in a data set. The Elbow-method consists of plotting the explained variation as a function of the number of clusters and picking the elbow of the curve as the number of clusters to identify the optimal number of clusters. For safe samples, while applying EG-SMOTE to prevent intra-clustering imbalance, if resampling is applied without identifying the optimal number of subclusters, there are possibilities for one sub-cluster to be given less representation.
This will again introduce an intra-cluster imbalance than addressing it. Fig.5. represents experiments done with BoT-IoT dataset and results do explain that all classifiers tend to perform well in their optimal cluster number than any other cluster number manually chosen.

Algorithm 2
Parameters: Nodemap is the Dictionary of assigned labels to nodes.

B. GSOM ALGORITHM BASED ON HD COMPUTING
As noted earlier, the GSOM algorithm based on HD computing has been demonstrated to be effective in low energy settings for unsupervised learning from unlabeled data. The topological mapping of the GSOM algorithm encapsulates both original and synthesized samples into a structure that can be utilized for threat detection.
The workings of the GSOM algorithm are deliberated as follows. It consists of two phases: first, the growing phase in which the unsupervised learning process grows new nodes and adjusts the neuronal weights to accurately reflect the input space; and the second phase which is the smoothing phase in which the weights are finely adjusted and calibrated to for generalized learning across the input space.
Algorithm 2 is proposed as a post-processing step to GSOM algorithm [40] where a majority-voting label is assigned to each node, unlike the learning phase where multiple labels are associated with each input sample x(t).
Two important steps are defined by the algorithm following the execution of GSOM.
Step 1: The step involves assigning the node with the mode of the labels. This is shown as the transition from subfigure A to B (Fig.6). The imbalance of anomalies in the input samples is mitigated by incorporating EG-SMOTE which synthesized more anomalous samples, thus balancing the data. This will ensure the nodes that represent the anomalies will not be ignored.
Step 2: Given that X contains all the input samples whose labels are associated with a particular node. The second step involves a tie breaker by assigning the label associated with the input x (x ∈ X) which has the minimal distance to that neuron as calculated using (3) and (4). This is shown by the transition from the subfigure B to C (Fig.6.), assuming that the label associated with the closest inputs for nodes B and D are 1 and 0 respectively.

Algorithm 3
Parameters: W -Weights of all nodes in node map w i -Weights of ith node Start

C. CLASSIFICATION
After finalizing of the labels in all the nodes, classification will be carried out for each of the new unknown inputs x1(t), as depicted in Algorithm 3. When there is new data to be classified as whether it is normal or an anomaly, the distance is calculated between the weights of the input vector and the weights of each node. Thereby BMN is determined as the node with the minimum distance by using equations (5) and (6). Then the label associated with that particular node will be given as the prediction for that input sample. For instance, consider the right subfigure C of Fig.6, suppose that node A is the one with the minimum distance to that new unseen input, then the predicted value is 1. This process is repeated until all the values are predicted for the entire test dataset.

IV. EXPERIMENTS
This section provides the details of the results of the experimentation that was conducted to evaluate the EG-SMOTE in handling imbalanced data. We have compared the performance of EG-SMOTE with SMOTE and G-SMOTE across all datasets for the following classifiers, Logistic Regressor (LR), Gradient Boosting Classifier (GBC), K-Nearest Neighbours (KNN), Decision Tree (DT), XGBoost and GSOM classifier. A variety of hyper-parameters were selected based on grid-search and k-fold cross-validation for result comparison and optimization.
We have evaluated the performance of the classifiers and the oversampling techniques using k-fold cross validation with k = 5. In order to solve the data imbalance in the training set, we applied the oversampling techniques in the k-1 folds of the k-fold cross validation procedure to generate synthetic data and to obtain a balanced training set. The models which are trained on this data are validated on the remaining fold along with performance evaluations. We have tried a number of different hyperparameters for the over samplers and the classifiers. For SMOTE, we have used k ∈ {5, 3} for the parameter k nearest neighbors, for GSMOTE we have used nearest neighbors k ∈ {5, 3}, the deformation factor α def ∈

A. DATASETS
The proposed approach was empirically evaluated using three benchmark datasets: KDD99 [54], NSL-KDD [55], CICIDS2017 [56] and Bot-Iot Dataset [57]. All four datasets exhibit the challenges of imbalanced or skewed data sampling as well as unlabeled data streams in an IoT Edge setting.  Datasets with more than two classes were modified to represent binary classes, and the datasets were pruned to reduce dimensionality after feature ranking. Table 1 shows the details of each dataset (IR represents the imbalanced ratio). Various performance metrics can be used to evaluate a model: F-Score, g-mean, and Area Under the ROC Curve (AUC).
• F-Score: Harmonic means of precision and recall and, therefore, balances a model in terms of precision   and recall.

B. RESULTS
The following tables , TABLE 2, TABLE 3, and TABLE 4 present the mean cross-validation scores for each combination of over samplers, evaluation metrics, and classifiers for NSL-KDD, KDD99, and CICIDS2017 dataset, respectively.
The above experiments were conducted to demonstrate the performance of the EG-SMOTE sampling approach. The results for EG-SMOTE demonstrate a significant improvement for the prediction of anomalous samples in imbalanced datasets as compared with other resampling techniques such as SMOTE and G-SMOTE.
The F-Score reflects the harmonic mean between precision and recall and is considered as a reliable metric for imbalanced classification tasks. G-SMOTE claims that it outperforms Random oversampling SMOTE and borderline SMOTE [35]. The results presented in Table 2, Table 3, and Table 4 suggest that the proposed approach achieves a higher F-Score for most of the classifiers than other over samplers. The classifier based on GSOM performs considerably well, compared to existing classifiers, which confirms its utility in IoT Edge applications. The experiment conducted with the CICID datasets suggest that EG-SMOTE algorithm outperformed all the compared oversampling methods. In addition, EG-SMOTE performs equally well for the new GSOM classifier. Results from the Bot-IoT dataset are presented in Table 5, here again it can be seen that the proposed machine learning approach performs better than the other techniques.

V. CONCLUSION
In this paper, we proposed a novel machine learning method for effective, efficient and secure cyber threat detection at the IoT Edge. The method was empirically evaluated using three benchmark datasets, KDD99, NSL-KDD, CICIDS2017, and an industry-focused botnet IoT traffic dataset, BoT-IoT. Its effectiveness is demonstrated in addressing the challenge of high volume, high velocity unlabeled data streams generated at the IoT Edge. Its efficiency is based on the GSOM algorithm that utilizes HD computing for sparse distributed feature representation and learning from unlabeled data in low-energy settings such as Edge layers. It is secure as it is boosted by minority resampling of imbalanced data generated by cybersecurity threats and attacks at the IoT Edge. Furthermore, the EG-SMOTE algorithm addresses the challenges of synthesizing noisy minority samples, overfitting due to extreme synthesis of minority samples, and improper synthesis along the borderlines due class imbalanced datasets. The GSOM algorithm transforms high-dimensional data into low-dimensional data while preserving the underlying topology representation of the minority resampling boosted datasets generated by the EG-SMOTE algorithm. The latent representation generated by the GSOM algorithm is effective in detecting cyber-physical attacks of varying origins. As future work, we intend to evaluate the proposed approach on a large-scale IoT Edge application, and second, we intend to explore multi-label classification and a safe zone for point generation based on the k-nearest neighbors than relying on the category to improve the efficiency of cyber threat detection at the IoT Edge. She joined the teaching faculty of the university upon graduation and was appointed the Head of the Department of Computer Science and Engineering, in 2005, and served in that capacity for six years. During her sabbatical leave, in December 2011, she worked as the Dean of the Faculty of Electrical and Information Technology, Northshore College of Business and Technology. She was then appointed as the Deputy Project Director of the Higher Education for the Twenty First Century (HETC) Project of the Ministry of Higher Education of Sri Lanka, from 2013 to 2015. She is currently a Senior Lecturer. She is also the Director of the Centre for Open and Distance Learning, University of Moratuwa. She has been instrumental in setting up the DataSERACH-multi-disciplinary research center engaged in research in data science, engineering, and analytics at the University of Moratuwa and setting up of the new data science and engineering stream at the Department of Computer Science and Engineering.
Mrs. Nanayakkara serves as the Board Director of the Women's Chamber for Digital Sri Lanka and LIRNEasia. In 2016, she was awarded the ''Female ICT Leader of the Year'' by the Computer Society of Sri Lanka.
DAMMINDA ALAHAKOON (Member, IEEE) received the Ph.D. degree in artificial intelligence from Monash University, Australia. He is currently a Full Professor and the Founding Director of the Centre for Data Analytics and Cognition, La Trobe University, Australia. He has made significant contributions with international impact toward the advancement of artificial intelligence through academic research, applied research, research supervision, industry engagement, curriculum development, and teaching. He has published over 100 research articles; theoretical research in self-structuring AI, human-centric AI, cognitive computing, deep learning, optimization; and applied AI research in industrial informatics, smart cities, robotics, intelligent transport, digital health, energy, sport science, and education. VOLUME 9, 2021