Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000

An ever-increasing number of computing devices interconnected through wireless networks encapsulated in the cyber-physical-social systems and a significant amount of sensitive network data transmitted among them have raised security and privacy concerns. Intrusion detection system (IDS) is known as an effective defence mechanism and most recently machine learning (ML) methods are used for its development. However, Internet of Things (IoT) devices often have limited computational resources such as limited energy source, computational power and memory, thus, traditional ML-based IDS that require extensive computational resources are not suitable for running on such devices. This study thus is to design and develop a lightweight ML-based IDS tailored for the resource-constrained devices. Specifically, the study proposes a lightweight ML-based IDS model namely IMPACT (IMPersonation Attack deteCTion using deep auto-encoder and feature abstraction). This is based on deep feature learning with gradient-based linear Support Vector Machine (SVM) to deploy and run on resource-constrained devices by reducing the number of features through feature extraction and selection using a stacked autoencoder (SAE), mutual information (MI) and C4.8 wrapper. The IMPACT is trained on Aegean Wi-Fi Intrusion Dataset (AWID) to detect impersonation attack. Numerical results show that the proposed IMPACT achieved 98.22% accuracy with 97.64% detection rate and 1.20% false alarm rate and outperformed existing state-of-the-art benchmark models. Another key contribution of this study is the investigation of the features in AWID dataset for its usability for further development of IDS. INDEX TERMS IoT security, intrusion detection, feature engineering, mutual information, machine learning, edge computing.


I. INTRODUCTION
The role of edge devices has been elevated by the recent development of cloud and IoT technologies supporting in the need for intelligent, computing power and advanced services at the network edge.This new concept allows decentralised processes in interconnected devices.The rapid growth of interconnected smart and mobile devices has posed significant dangers on security and privacy of individuals, societies, nations and even in the extreme, the globe as a whole [1].The impact of massive data breach and security threats is increasing with more advanced emerging applications such as healthcare, smart homes and cities and autonomous vehicles.All these domains deal with sensitive and confidential data, deeply mined from private activities on daily basis and the nature and scale of interconnection of the devices do not only seriously harm a single device or operator, however, all connected objects and involved humans in a large scale.
Such devices however have unique security challenges [2], among which their limited computational resources such as limited energy source (e.g.battery power) and computational power (e.g.processors and memories) [3] [4].The requirement of real-time processing also adds complexities on the development and deployment of both existing and new security measures.
IDS has been effective as a next line of defence for such computing devices and networks and extensively studied since the seminal work by Denning [5].IDS can be classified into two major categories, signature and anomaly based.The anomaly-based IDS is designed to detect unknown attacks that deviate from the profile of normal network activities.On the other hand, signature-based system can only detect known attacks that can match the patterns or signatures stored in a database.As cyber-attacks are evolving, the flexibility and adaptability of signature-based IDS need to be further developed.
The concepts in machine learning (ML) and its subfield deep learning (DL) seem to be, by their inherent approaches, the right candidates for designing the adaptable IDS [6] [7].However, high-dimensional nature of ever-increasing data and iterative training process of models require extensive computational resources, thus, traditional ML-based IDS are not suitable for training and inference on resourceconstrained devices.
Due to the high demand of computational resources for training and inference, the current approach is to transfer collected data to the central nodes (e.g.data centres) that have powerful resources.However, the distance between the devices and remote central nodes causes latency which could be a bottleneck to modern time-critical systems and applications that often require real-time processing of such big data.Besides, this centralised approach implies a single point of failure.For example, dysfunctionality or shutdown of a part of system leads to the failure of the entire system and has other issues including storage capacity, availability, scalability and privacy.
To mitigate the aforementioned problems, a new paradigm, called edge computing [8] has emerged.Its principle relies on the ability to perform computational tasks locally such as data processing and analysis are performed at the edge of the network near or at data sources rather than the central nodes.This paradigm benefits from the proximity between the data sources and computing nodes and also can solve the problem of poor or absent connectivity and bandwidth which are always required in the cloud-based systems.It is not surprising that cloud to the edge is one of the top strategic technologies for 2018 and 2020 according to a report by Gartner [9] [10].
To compute efficiently and effectively closer to or at the edge of the network, the utilisation of ML approaches that can enable dimensionality reduction of data and efficient detection is critical.This study investigates potential ML methods to design and develop an efficient and effective ML-based IDS for the resource constrained edge devices which involve processing of a large amount of data and training of models.The key contributions of this study are:  to determine the feasibility of a lightweight machinelearning IDS to be designed and deployed on resource constrained devices,  to demonstrate, building upon earlier work [11] [12], the effectiveness of extracted abstract features using a deep SAE, along with mutual information theoretic feature selection that outperforms other state-of-the-art models,  to propose an architecture of gradient based SVM for the proposed IDS model,  to analyse the temporal features within AWID dataset and their usability for the further development of IDS, and  finally, to provide a new benchmark result on AWID dataset without using temporal features.The remainder of this paper is organised as follows.Section II introduces the proposed IMPACT algorithm outlining its three novel concepts.Section III analyses and evaluates the performance of IMPACT and existing benchmark models along with investigations of the features of the AWID dataset and Section IV concludes with recommendations for further research.

A. Data
To train, test and evaluate the proposed model, AWID dataset [13] was used due to its unique features in comparison with other existing datasets.While it contains new attack types, the AWID dataset is simulated using realworld wireless network which is a critical feature for modern IoT environments.
The dataset is divided by the types of attack classes."ATK" set contains 16 attack classes and "CLS" has 4 classes in which 15 attacks are categorised by attack methodologies: impersonation, flooding and injection.For this study, "CLS" dataset is used and impersonation attack is considered only.The impersonation attacks included in the dataset are Caffe Latte, Evil Twin, and Hirte attack.Caffe Latte and Hirte are keystream retrieving attacks and Evil Twin is a man-in-the-middle attack according to their attack purpose.The tools used to implement the attacks include the Aircrack-ng suit, MDK3 tool, the Metasploit framework and custom tools made by authors using C language and the Lorcon2 library.Attackers mostly use the Airbase tool contained within Aircrack-ng suit for releasing Evil Twin attacks.
To gather the data, the authors created a realistic resource constrained environment of small office/home office (SOHO) wireless network infrastructure that consisted of a number of mobile and static clients such as smartphones, tablets, smart TV and laptops and a single mobile attacker node to release the attacks.A single Access Point (AP) was set up with the WEP encryption.
The dataset in the original form is imbalanced in such manner that the size of the normal class is significantly larger than the attack class with the ratio of 10:1 for the training set and 11:1 for the test set.Since this configuration could result in a bias during the model training phase, the dataset is balanced making the ratio 1:1 between the two classes for both training and test sets through pre-processing [12].

B. IMPACT
The IMPACT has three main components: i. feature extraction, ii.feature selection and iii.classification.Through feature extraction and selection, the dimensionality of data required for training and testing the model is reduced, increasing the efficiency of the model in terms of computational cost required to deploy on the resourceconstrained devices.Stacked autoencoder (SAE), a type of deep neural network, was used for feature extraction and mutual information (MI) and C4.8 wrapper for feature selection.For the detection task, SVM with gradient descent optimisation was adopted that they were more effective in terms of detection performance compared to other models based on the experiment results.
To build the model, reduced AWID training and test datasets with 154 features were fed to the SAE.Through the SAE, a set of 50 new features with new data instances were extracted and appended to both the original training and test sets, producing the larger dataset with 204 features as a whole.This dataset was the input for the feature selection to find the reduced optimal feature subset and the reduced training and test sets with the final 5 selected features were used for training and testing of the ML classifier which produces the best classification result.
As shown in Fig. 1., an autoencoder (AE) [14] is a type of unsupervised neural network algorithm that learns from unlabeled data using backpropagation.It sets the output values to be the same as the input values, trying to learn the hypothesis function, The AE consists of an encoder and a decoder in which the encoder compresses input data into a low dimensional representation and the decoder reconstructs the input data from the low dimensional representation.In other words, the input data is replicated at the output layer.During the process of encoding, the input feature vectors are converted to an abstract feature vector and the dimensionality of the input data space can be reduced.
To achieve this, several constraints should be put on the network.For instance, setting the number of hidden neurons less than that of the input features, and some meaningful representations of the data can be discovered while attempting to reconstruct the input with the limited number of hidden neurons.Consequently, if some correlations exist between the features, the algorithm would be able to find them.
The constraint (2) is imposed on the hidden neurons in the encoders to compress the representation of the input data and extract features, where  ̂ (3) is the average activation and   () is the activation of the hidden neuron  respectively.If the activation of the neuron  is 1, the neuron is active and if the activation is 0 (or −1 if ℎ is used as activation function instead of  ), the neuron is inactive.The variable  denotes the sparsity parameter and is set to the value near  to force the neurons to be inactive most of the time.

𝜌 ̂𝑗 = 𝜌
(2) The cost function of the AE is specified by the mean squared error (MSE) function ( 4), given m training instances and the cost function for a single instance L2 regularisation (5), also called weight decay term, is added to the cost function, which will prevent overfitting by reducing the magnitude of the weights   () between neuron i in layer l and neuron j in layer l+1: where L is the total number of layers in the network and n and k are the number of neurons in layer l and l+1 respectively.
In addition, a penalty term, called sparsity regularisation ( 6) is added to the cost function to penalize  ̂ that diverges from  using the Kullback-Leibler (KL) divergence [15].KL is a measure of the different between two different distributions.This function has the value either zero if (2) is satisfied or higher if  ̂ diverges from .Hence, minimising this term encourages  ̂ to be close to .  2 is the number of hidden neurons within the encoder.
The overall cost function is then the sum of MSE, L2 regularisation and sparsity regularisation term, where  and  controls the strength of L2 regularisation and sparsity respectively.
A stacked (or deep) autoencoder (SAE) consists of multiple AEs connected from one layer to the subsequent layer.The output of the previous encoder is the input of the next encoder and from this structure, higher representations, i.e. features, of the input data can be found.The reason why the SAEs was chosen is explained by the fact that a single AE behaves too greedily and important information for accurate classification of the target class could be discarded.
The SAE prevents such behaviour by refining gradually the neurons in the hidden layers.In other words, the SAE learns a better representation of the input data than a single AE.However, as more encoders need to be trained, the training time and complexity of model are increased.For the number of hidden neurons for two encoder layers, 100 and 50 were chosen respectively which were found to be optimal for the AWID impersonation dataset according to Aminanto et al. [11].
Following feature extraction using SAE, IMPACT performs feature selection to find the optimal feature subset from the whole feature set comprising of the original and extracted features produced from the feature extraction stage.This process finds the most relevant features and removes irrelevant features so that it reduces the complexity and computational cost of the model and also improves the detection performance.Hence, the feature selection can make the model both efficient and effective achieving the aim of this study.Among a variety of available methods, this study utilises mutual information (MI) and C4.8 wrapper.
MI is a quantity that measures the mutual dependence between two random variables.That is how much information one random variable has about another.In other words, it is the indication of the reduction in uncertainty of one random variable when given the knowledge about another.MI is related to the concept of entropy H (8) which is the expected information content in a random variable X: () = − ∑ (  )(  ). (8) Herein P denotes the probability that an event with index  occurs.Conditional entropy (9) of two random variables  and  with values   and   can be defined as where , (10) where (, ) is the joint entropy.The higher the MI value is, less the uncertainty in a variable is and vice versa.Zero MI means the variables are independent.
C4.8 wrapper [16] is a decision tree-based algorithm extended from ID3.It uses pruning strategies to avoid overfitting.During the learning process of C4.8 algorithm, a decision tree is built first from the given training set using ID3, and then the learnt tree is converted into a set of rules, each of which is a rule for the path from the root to a leaf node.Each rule is pruned where preconditions that improve the estimated accuracy are removed.The pruned rules are then sorted by the accuracy and considered when subsequent instances are classified.A feature is useful for generalisation if it is present as a node or part of the rules and in contrast, the removed features are not important if they do not improve the accuracy.C4.8 utilises the measure of information gain (IG), which is exactly the MI, to select features and these features are then used as a subset for ML classifiers.Finally, IMPACT classifies network data into two classes using support vector classifier: normal and attack.For this task, linear support vector machine (SVM) with gradient descent as the optimiser is utilised.Linear SVM is a supervised machine learning algorithm used to deal with binary classification problems that have two classes.Many possible boundaries or hyperplanes that can separate the classes exist, thereby a method to find the best one is required.SVM aims to find the optimal decision boundary (or maximum-margin hyperplane) in the way that the margin between the boundary and the nearest data instances of the classes is maximised as shown in Fig. 3.The nearest data instances that define the maximum margin (or hyperplane) are called support vectors.
Given a training data of  instances ( 1 ,  1 ), … , (  ,   ), where   is the true class of input data   ( = 1, … , ) and either 1 or −1, the decision boundary is defined as where  is the weight vector and  is the bias.
To prevent the data instances from lying on the incorrect side, the following constraints are added for each : = −1,  T   +  ≤ −1 and these can be combined into SVM can solve non-linearly separable problems by utilising the method called kernel trick that maps the original data into higher dimensional space to make the data linearly separable.A potential limitation is that SVM may require extensive training time.Though SVM produced high performance results, the training times are often too high in comparison to other classifiers.However, in this study, by using a linear form of SVM, the training time was reduced while achieving comparable results.
SVM uses hinge loss as its loss function for optimisation.In linear SVM, for an output   = ±1, the hinge loss can be defined as  (0, 1 −   (  )). ( If (  ) predicts the correct class, then   and (  ) have the same sign and   (  ) ≥ 1, so the loss is zero.If   and (  ) have the opposite sign and   (  ) < 1, the loss increases linearly.The hinge loss penalizes incorrect classifications within   (  ) < 1 that corresponds to a margin in SVM.
The objective function () (18) consists of two terms: regularisation term and loss.As the hinge loss function is convex, ML convex optimisers can be used.For optimisation, the objective function should be minimised: Gradient descent takes steps iteratively to update parameters in the direction of the gradient.To run gradient descent, derivatives with respect to  and  are required.However, the hinge loss is not differentiable, thus, a subgradient should to be used with respect to w and (  ) as follows:

III. EVALUATION AND ANALYSIS
The confusion matrix is commonly used to evaluate the performance of a ML model, particularly for binary classification which is the case in this study.Based on the confusion matrix, the below evaluations measures are intended to give information on the effectiveness and efficiency of the proposed algorithm.The evaluation measure used are accuracy (Acc), detection rate (DR), false alarm rate (FAR), F-measure (F1), Mathew's correlation coefficient (Mcc) and Time To Build (TTB) and can be calculated using the below equations.

A. Theoretic Feature Selection using Mutual Information
After the feature extraction process, MI values for all 204 features consisting of original 154 and extracted 50 features were calculated.The features then were ranked from the highest to the lowest MI values.Among 204 features, 83 features were found to have the MI values greater than 0 and the rest 121 features had the value 0 which means that they had no relevance to the attack class.All 50 extracted features were among the afore-mentioned 83 features whereas 33 were original features of which only 4 features were within top 20 features.This suggests that the SAE was able to successfully extract the features that are relevant to the attack class with meaningful representations.In turn, it demonstrates the effectiveness of SAE as a feature extraction method to build a lightweight IDS by discovering relatively more meaningful features and reducing the dimensionality of data and the complexity of the model.Among the most relevant 20 features based on the MI values were original features 4, 7, 8, 9, 38 and 82, however, there was some redundancy that the features 4 and 7 had exactly the same data instances resulting in the same MI values and so did 8 and 9. Therefore, features 7 and 9 were removed from the datasets for training the model.The top 20 features based on MI values are 8, 82, 4, 38, 157, 162, 168, 160, 188, 161, 199, 176, 159, 191, 182, 186, 195, 156, 158 and 165 [12].
To find the optimal subset from the top 20 features, Parker et al. [6] experimented five wrapper algorithms to select features and evaluated in terms of the number of features and Acctraining time with the aim of minimising computational cost for resource-constrained devices.C4.8 has taken the least time compared to the other algorithms.In terms of the number of features, C4.8 had only one or two more features than RF, MLP, and RBF that were significantly slower than C4.8 even though they resulted in the smaller number of features.Though logistic regression was the second fastest algorithm, it had the number of features twice or more than all the other algorithms, significantly increasing the complexity and computational cost of the model.The selected feature subset consists of five features including three original features 4, 8 and 82 and two extracted features 156 and 157.

B. Gradient-based Optimisation
The weights of SVM are found using gradient decent algorithm.Learning rate of 0.00001 achieved the highest DR and lowest FNR, however, it showed the worst performance in Acc, Precision, FAR, F1 and Mcc.There is a trade-off between DR and FAR as DR tends to fall whereas FAR improves.The overall performance slightly improves between 0.00001 and 0.1 and rapidly increases between 0.1 and 0.5.Acc, FAR, Precision, F1 and Mcc gradually increase until 0.1 then rapidly improves until 0.5.Therefore, the learning rates around 0.5 -0.51 and 0.52were investigated.In addition to that, 0.5 has the highest Acc, F1 and Mcc, however, also the second highest Precision and second lowest FAR and DR, thus, 0.5 was chosen.Learning rate of 0.55 has the highest Precision and lowest FAR, however, worse in other metrics than 0.5.The final results using learning rate of 0.5 are provided in Table I.

C. Comparisons between baselines and IMPACT
The most recent research for impersonation attack using AWID datasets were performed by Kolias et al. [13], Aminanto et al. [17], D-FES Corr [11] and DEMISe-RBFC and DETEReD [12].As shown in Table I, IMPACT achieved the highest F1 and Mcc while Acc is the second highest and FAR is the second lowest.Kolias et al. [13] has the lowest Mcc and the highest FAR.This is considered to be due to the imbalanced dataset used and feature selection method that Kolias et al. [13] utilised only expert knowledge without any ML, data-driven or statistical methods.Compared to D-FES Corr [11], though IMPACT has higher FAR by 0.16% and lower Acc by 0.004%, it achieved higher DR by 1.73%, higher F1 by 2.04% and higher Mcc by 1.4%.Even though DETEReD and DEMISe-RBFC achieved the highest DR and excelled in Acc, F1 and Mcc, DETEReD has the highest FAR and DEMISe-RBFC the second highest.Both have much higher FAR-more than double-than either IMPACT or D-FES Corr [11].Considering the throughput of network data in the era of big data, this amount of false alarms cannot be ignored because it will cause much higher cost to network administrators than IMPACT.Within the context of IDS, minimising FAR is crucial.In comparison with DETEReD and DEMISe-RBFC, the IMPACT has lower DR, however, it is still higher than those of three other models (D-FES Corr, Kolias et al. and Aminanto et al.) and has FAR less than half of the results of DETEReD and DEMISe-RBFC.The reason why DETEReD had the better result for DR is that it had more number of TP than that of IMPACT whereas IMPACT had a higher sum of TP and TN for Acc than that of DETEReD.The values of the denominators for both DR and Acc were the same in the two models.For Mcc, IMPACT had a higher proportion of the numerator per denominator than DETEReD.
IMPACT performed better with the optimised subset selected, using C4.8, from the top 20 features rather than 10 features in contrast to the result produced by DEMISe in which the authors' logistic regression classifier showed better performance with the optimised subset from the 10 features.
The training time of the model is also an important measure for computational time efficiency of the model.IMPACT has TTB requirement considering of SAE and classifier training time, but excluding the time required for C4.8 wrapper as the feature subset was provided by the authors of the earlier work, DEMISe, and there was no need of rerunning C4.8.Kolias et al. [13] and Aminanto et al. [17]

D. AWID Feature Analysis
Each feature within AWID dataset has been investigated in order to verify if any of them contains temporal information.
The temporal features (a.k.a.time domain features), which are simple to extract and have an easy physical interpretation.However, if the presence of information within the temporal   In impersonation attack such as Evil Twin and Caffe Latte, it is found that the number of beacon frames in the victim's network are almost doubled and about half of these frames contained intrusive characteristics, that is, the impersonation attacks occurred during these durations [13].
Unfortunately, we found that Kolias et al. [13] set up an attacker and attacks were injected at particular times in their experiments and these were recorded in some of their features.The features used in AWID dataset were derived from Wireshark and the full list can be found in the official AWID dataset website [18] and Wireshark display filter reference page [19].Among the selected features in Table II, the top ranked raw feature 4 (frame.time_epoch) is the epoch time when the frame was captured as shown in Fig. 4 and the redundant feature 7 (frame.time_relative)also has the same characteristics as feature 4 and therefore, it had the same MI value as feature 4 as mentioned in Section III-A.Additionally, feature 38 (radiotap.mactime) is MAC timestamp, another temporary feature, defined in Radiotap [20] as "Value in microseconds of the MAC's 64-bit 802.11Time Synchronization Function timer when the first bit of the MPDU arrived at the MAC.For received frames only."[20].We found that the benchmark models, DEMISe and DETEReD utilised the temporal feature of 4, while Aminanto et al. and D-FES utilised all three temporal features of 4, 7, and 38.As the final selected feature set of IMPACT has only temporal feature (frame.time_epoch), the model trained without the temporal feature was experimented and the results showed that the model without the feature had worse performance than the model with the feature.Therefore, it has been proved that the temporal feature significantly contributed to the performance of the model and the feature selection method was effective, however, in fact, this feature is not valid to use for the development of IDS.

IV. CONCLUDING REMARKS
This paper presents the development of a machine learning based approach of an IDS offering the ability to be deployed and run directly on the resource-constrained devices.This was achieved through a smart strategy aiming to reducing the complexity of the model which consists of two main steps.First to the ability reduce the number of features through feature extraction and selection using SAE and MI and to evaluate their effectiveness in both efficiency and performance.The results showed that the extracted abstract features were selected as top features among the whole set of original and extracted features.MI values of the features could be utilised to select most relevant features and remove irrelevant features, resulting in the reduction of the complexity of the model without decreasing the performance, however, outperforming other models.
The second step consisted in training and testing the linear SVM using gradient descent.In comparison with other models using different classifiers or SVM, (providing higher training time on the AWID impersonation dataset), the IMPACT demonstrated better performance including much lower FAR compared to DEMISe models.With the investigation of temporal features existing in AWID dataset, IMPACT provided its new benchmark results without using any temporal features in AWID dataset proving that it is the only ML-based IDS tailored for resource constrained devices and which is independent of such features in contrary to its competing DEMISe, DETEReD and D-FES algorithms.
Based on these findings, the ways for further development could be proposed.Firstly, successful use of an SAE, opens perspectives for the use of other deep neural networks to extract abstract features.Secondly, this study only focuses on impersonation attack, however, there are two other type of attacks in AWID dataset, flooding and injection.IMPACT has not been yet tested against these, neither on newer attack types found in wireless IoT networks.Finally, IMPACT needs to be trained and tested on additional datasets providing their own features existing within the IDS research in order to prove its usefulness and effectiveness.Today, most wireless sensor network used as an automatic data acquisition and transmission system in monitoring applications is based on 802.15.4.However, the dataset in [7] is built on 802.11.For the usefulness and coverage of the proposed algorithm, in our future work, the proposed algorithm will be tested on a new benchmark dataset created on 802.15.4.

Fig. 1 .
Fig. 1.Autoencoder (AE), where m and n indicate the number of neurons in the layer, x is an input feature, b is a bias, a is an activation and y is an output.

Fig. 3 .
Fig. 3. Linear Support Vector Machine do not provide the exact model build time.All the models were run on different hardware setups, thus, the models cannot have fair comparison in terms of training time.However, in terms of the number of features which could be a measure of memory efficiency of the model, IMPACT utilises the least number of features, significantly less than the three other benchmark models (D-FES, Kolias et al. and Aminanto et al.), while outperforming them.Overall, the IMPACT achieved the performance t mitigates the drawbacks of DETEReD and D-FES Corr as FAR is significantly lower than DETEReD, DR is better than D-FES Corr and F1 and Mcc are the best among all the other models.Based on the evaluation of the comparison of performance results, it proves the effectiveness of SAE, MI and C4.8 wrapper methods for the dimensionality reduction of dataset for the lightweight IDS reducing computational cost in terms of time and space.

Fig. 5 .
Fig. 5. Performance comparisons between IMPACT and the state-of-theart-models in terms of FAR (Kolias et al. is out of range.)

Fig. 4 .
Fig. 4. Performance comparisons between IMPACT and the state-of-theart-models in terms of Acc, Dr, F1 and Mcc (Kolias et al.'s Acc and Mcc and Aminanto et al.'s DR, F1 and Mcc are out of range.)

TABLE I COMPARISONS
BETWEEN IMPACT AND THE STATE-OF-THE-ART MODELS The performance of IMPACT is measured on learning rate of 0.5 on the feature subset of three original features (4, 8 and 82) and two abstract features (156 and 157) only.* The time to build (TTB) for the models includes the 293s required for SAE.** Includes TTB required for both SAE and C4.8 wrapper.NRA = No results available.