Effective Attack Detection in Internet of Medical Things Smart Environment Using a Deep Belief Neural Network

The Internet of Things (IoT) has lately developed into an innovation for developing smart environments. Security and privacy are viewed as main problems in any technology’s dependence on the IoT model. Privacy and security issues arise due to the different possible attacks caused by intruders. Thus, there is an essential need to develop an intrusion detection system for attack and anomaly identification in the IoT system. In this work, we have proposed a deep learning-based method Deep Belief Network (DBN) algorithm model for the intrusion detection system. Regarding the attacks and anomaly detection, the CICIDS 2017 dataset is utilized for the performance analysis of the present IDS model. The proposed method produced better results in all the parameters in relation to accuracy, recall, precision, F1-score, and detection rate. The proposed method has achieved 99.37% accuracy for normal class, 97.93% for Botnet class, 97.71% for Brute Force class, 96.67% for Dos/DDoS class, 96.37% for Infiltration class, 97.71% for Ports can class and 98.37% for Web attack, and these results were compared with various classifiers as shown in the results.


I. INTRODUCTION
The IoT is a sort of network which interfaces anything with the Internet dependent on a specified protocol over data sensing devices leading to data sharing and interchanges and allowing smart identification, tracing, positioning, administration, and monitoring. The IoT's regular definition is as a physical objects network. The internet is not just a PC network, however; it has advanced into a devices network of different sorts and sizes, home appliances, smart phones, vehicles, toys, cameras, medicinal tools, modern frameworks, people, animals, and structures, which are all associated, each sharing and communicating data dependent on specified protocols [1].
The IoT is an internet of three kinds of relations: (1). Human to human, (2) Human to machine/things, and The associate editor coordinating the review of this manuscript and approving it for publication was Wei Wei .
(3) Things/machine to things/machine, all communicating over the internet [2]. The objective of the IoT is to allow things to be associated anytime, anywhere, with anything and anybody, desirably utilizing any paths/networks and any support [1]. The IoT has many applications. The commonly known applications comprise smart health services, smart transportation, and smart grids and structures [3]. The four-layer architecture of the IoT is shown in figure.1.

II. INTRUSION DETECTION
Intrusion detection is accepted to be an essential security system designed to manage attacks on networks and recognize malignant actions in computer network traffic. It assumes an imperative role in overall data security and supports in discovering, deciding, and detecting the unapproved use, duplication, modification, and demolition of data and data frameworks [4]. There are two regular security frameworks, the network security framework and the host security frameworks which secure the fundamental network and systems from unapproved access, malfunction, destruction, and change. These two frameworks might comprise various coordinated security models; for example, firewalls, antiviruses, and Intrusion Detection Systems (IDS) which allow a network or system to be observed and raise an alert when malignant action happens [5].
Comprehensively, IDSs are classified into three techniques: misuse detection, anomaly detection, and hybrid. Misuse identification methods utilize predefined signatures of malignant actions to detect intrusion. Hence, they are utilized for identifying known attacks. Anomaly detection methods characterize typical patterns and detect malicious actions depending on their difference from ordinary patterns. In this way, anomaly-based identification techniques have the ability to identify zero-day attacks [6]. Hybrid methods exploit both anomaly and anomaly identification techniques. Through lessening the false positives of unknown attacks, hybrid methodologies target expanding identification rates of known intrusions [7].
Intruder detection was one significant advance in assuring the IoT networks security. Intrusion detection is therefore one of many systems for handling security interruptions that can be identified in any of the four architectural layers of IoT represented in Fig. 1. The Network Layer not only operates as the support for linking diverse IoT devices; it additionally facilitates network-based security defence systems like NIDS. There are numerous IDS techniques; for example, techniques dependent on statistical analytics, cluster analytics, ANN, or deep learning. Within these techniques, intrusion detection which is dependent on deep learning performs better than various other techniques, because deep learning has a high capacity for self-learning, self-adaption, generalization, and the identification of unknown attack activity [8].

III. ANOMALY DETECTION
The present world features a wide ranging IoT which is producing a vast measure of information, and anomalies are an essential part of each framework. These anomalies could be an indication of resources drain in an industrial framework, an essential circumstance at an aeronautics platform to ignore unpredicted issues, or identifying unusual performance of medical instruments, and so on. Subsequently, having the option to identify the anomalies could enormously affect the total performance of any monitored model. The key difficulty in perceiving anomalies is describing the exact boundaries between abnormal/normal activities, as the accessibility of abnormal observations to train the models is usually insufficient. In practical situations, abnormal behaviour patterns have been minimally contrasted with normal behaviours [9]. Figure.2 shows a flow chart of anomaly detection.
In the anomaly identification framework (as shown in Fig. 2), the initial process is to understand the condition of the gathered data flow, which tends to be binary, discrete, or continuous, as well as the relationship framework. This relationship framework demonstrates whether it is time series information, spatial information, or graphical information. Identifying the kind of relationship supports the selection of the right method for detecting anomalies, examination, or expectations. The next step is to find the sort of anomaly from the predetermined set (for example: point anomaly, collective anomaly, or contextual anomaly).
The next process is to understand the presence of training information to design an anomaly identification framework. Based upon the presence of the information and its explanation, we might represent it as supervised, semi-supervised VOLUME 8, 2020 or unsupervised. That data helps developers to select suitable anomaly identification strategies. In supervised training, the availability of the information with a class label and its basic type of learning is used to identify the abnormal conduct of the framework. In unsupervised learning, we have information but no solid output (for example, a class label). Also, in semi-supervised learning we have constrained models with a class label while the remaining information is unlabelled [9].

A. TYPES OF ANOMALIES
A significant part of an anomaly identification method is the concept of the required anomaly. Anomalies can be characterized into the following three classifications: Point Anomalies: If an individual data model can be treated as abnormal regarding the remaining information, the model involves point anomalies. These are the most basic sort of anomalies and have been the target of most analysis on anomaly identification. In a practical example, credit card scam identification, let the dataset be compared to a person's credit exchanges. In order to simplify things, let us accept that the information is determined utilizing just one feature: amount spent. A cash transaction higher than the ordinary level which the individual would spend is a point anomaly.
Contextual Anomalies: If a data model is abnormal in a particular context (but not in another), it contains contextual anomalies (likewise stated as conditional anomalies). The concept of a context is caused by the structure in the dataset and must be determined as the segment of the issue definition. Every data model is characterized utilizing the accompanying two arrangements of features. Contextual attributesare utilized to decide the context (or neighbourhood) for that model; for example, in spatial datasets, the latitude and latitude of an area are the contextual attribute. In time-series information, time is the contextual attribute that decides the condition of a model on the total order.
Behavioural Attributes: These characterize the noncontextual attributes of a model; for example, in a spatial dataset defining the average rainfall of the whole world, the measure of rainfall in any area is a behavioural attribute. The anomaly conduct is resolved utilizing the qualities for the behavioural attribute inside the particular context. A data model may represent the contextual anomalies in the provided context. However, an equivalent data model (as far as behavioural attributes go) would be viewed as ordinary in a dissimilar context. This property is key in recognizing behavioural and contextual attribute for a contextual anomalies recognition method. The decision to implement a contextual anomalies identification method is made through recognizing the significance of the contextual anomaly in the objective application area.
Collective Anomalies: If an accumulation of pertained data model is anomalous regarding the whole dataset, it is known as the containing collective anomalies. The individual data model in this anomaly might not be anomalous by itself, but its event together as an accumulation is anomalous. Collective anomalies are investigated for arrangement, graph, and spatial information. It must be noted that while a point anomaly could happen in any dataset, a collective anomaly can only happen in a dataset in which data models are connected. The difference is that the event of a contextual anomaly relies upon the accessibility of context attributes in the information. A collective or a point anomaly could likewise be a contextual anomaly whenever considered in relation to the context. In this way, point or collective anomalies identification issues could become a contextual anomaly identification issue through consolidating the context information [10].

IV. PROPOSED ALGORITHM AND PROPOSED METHODOLOGY (DEEP BELIEF NETWORK
DBNs are generative techniques. A DBN comprises stacked RBMs which perform greedy layer-wise training to achieve solid execution in an unsupervised domain. In a DBN, training is achieved layer by layer, and each one is performed as an RBM trained over the past trained layer (DBNs are a group of RBM layers utilized for the pre-training stage and additionally turned into a feed-forward network for weight fine-tuning with a different approach.
The significant usage of RBMs is likely to be because there is a dearth of labeled data, and RBMs and auto-encoders can be pre-trained on unlabeled data and fine-tuned on a small amount of labeled data.
A greedy layer-wise training algorithm was used to train a DBN one layer at a time. The greedy layer-wise method was utilized because it optimizes each layer at a time greedily. After unsupervised training, there is usually a fine-tune stage, when a joint supervised training algorithm is applied to all the layers. It combines two ideas: 1) that the choice of initial parameters of a deep neural network can have a significant regularizing effect; 2) that learning about the input distribution can help with learning about the mapping from inputs to outputs. In the pre-training stage, the underlying features were trained by a greedy layer-wise unsupervised method, while a softmax layer was implemented in the finetuning stage to the top layer to enhance the features of the labelled samples [11]. Figure.3 represents the architecture of the DBN.
In order to visually represent the complexity, we standardized the SD as in equation.1: In RBM, v indicates every visible unit and h indicates every hidden unit. To decide the system, we sought to acquire the model's three parameters: θ = {W, A, B}. These were the weight matrix W , hidden layer element bias B, and visible layer element bias A, individually.
Assume an RBM has m hidden cells and n visible cells, v i indicates the i th visible unit, h i the j th hidden unit, and the parameters structure is shown as in equation.2: where w i,j indicates the weight among the i th visible cell and j th hidden cell from equation 3.
where, a i represents the bias threshold of the i th visible cell from equation 4; where, b j indicates the j th visible cell bias threshold. For an order of (v, h) through a present condition, presuming that hidden and visible layer follow Bernoulli distribution, the energy equation of RBM is represented as in equation 5: where, θ = {W ij , a i , b j } were the RBM model's parameters, and the function of energy showed the value of energy among the estimation of every visible node and every hidden layer node. Due to the regularization and exponential of energy function, the joint likelihood distribution equation could be acquired in which the nodes set of visible layers and the nodes set of the hidden layers were in a specific condition separately (v, h) as in equation 6: where, in equation 7, Z (θ) was the standardized factor or distribution function indicating the total energy exponents of every single available condition of the set of hidden nodes and visible layers [10]. The determination of the probability function is frequently utilized to obtain the parameters. Having presented the joint likelihood distributions P(v,h|θ), the marginal distributions P(v|θ ) of the nodes set of the visible layers could be acquired through summations of the overall conditions of the hidden layer nodes set in equation 8: The marginal distributions indicate the likelihood with which the arrangement of nodes in the visible layers was in the specific level distribution. Because of the exceptional layer-layer connections and inter-layer connectionless form of RBM system, it has the accompanying significant conditions: Having presented the condition of the visible cells, the enactment conditions of every hidden layers cell were restrictively autonomous. Here, the initiation likelihood of the j th hidden element was as shown in equation 9: Accordingly, once the condition of the hidden elements was specified, the initiation likelihood of the visible elements was additionally conditionally independent as represented in equation 10: where, σ (x) is the sigmoid function. To decide the model of RBM, it was important to sort out the three parameters of the model: The parameter arrangement utilized the logarithmic probability functions to take the subordinates of the parameters.
, energy E is inversely proportional to probability P, and E was limited through expanding P.
The regular strategy for expanding the functional probability was the inclination raise technique that relates to the change of parameters as indicated by the accompanying equation 11: This iterative process expanded the probability P and reduced the energy E [11]. The flow of algorithm can be outlined as: Step 1: Initiate the population and produce diverse number of hidden layers and the total neurons in every layer randomly; Step 2: Compute the fitness rate as per Eq. 1, selected by the roulette technique, and keep the ideal individual in the present; interval crossover; variation; Step 3: ''Elite'' holds, holding the individual with the best value of fitness in the process development; Step 4: Find if the highest count of iterations has been achieved. Once achieved, the network structures generated are held, or repeat Step2-Step3 once more; Step 5: Utilize the optimal networks structure for DBN and train the IDS system; Step 6: Classify the testing sets through the trained DBN model, and lastly coordinate the results of classification with the classification data of the testing sets to validate the classification accuracy.

V. IMPLEMENTATION AND RESULTS ANALYSIS A. DATASET DESCRIPTION
The CICIDS2017 dataset was used to direct this work. Generally, many DDoS attack datasets have numerous impediments like non-pertinent information or redundancy that make them inconsistent. The CICIDS2017 dataset has recent network identical data. This dataset was gathered for five continuous days (Monday -Friday) with various attacks as well as normal information, as shown in Table.1, above. This dataset has the network information with and without attacks, which make it close to true network data. The dataset was uneven, so a duplicating technique was used as unevenness critically impacts the deep learning technique training so we had to ensure that the testing was balanced [12]. This work was implemented utilizing Keras on the Tensorflow package for deep learning on 64-bit Intel Core-i7 CPU with 16 GB RAM on the Windows 7 platform. The Machine learning algorithm was executed in MATLAB. Table.2 represents the instances with the class labels of the dataset.
Heartbleed Attack: The attackers use the OpenSSL protocol to embed malignant data within OpenSSL memory, giving the attacker unapproved permission to important information.
Web Attack-SQL Injection: An SQL injection is a code injection method, utilized to attack data-driven applications, including odious SQL proclamations embedded within a section area for implementation.
Infiltration: The attackers utilize infiltration strategies and software to infiltrate and obtain complete unapproved logins to the networked system information.
Web Attack -XSS: The attackers infuse generally trusted websites and benign web applications to forward malignant contents.
Web Attack -Brute Force: The attackers attempt to acquire privileged data; for example, PINs and passwords, utilizing trial-and-error.
Bot: The attackers utilize Trojans to break the protection of many victim machines, assuming responsibility for those machines and arranging each machine in the Bot network so it can be used and controlled by the attackers remotely.
DoSSlowhttptest: The attackers use the HTTP Get request to circumvent the count of HTTP connections permitted on the server, inhibiting various users from approaching and providing the attackers the chance to enable numerous HTTP connections with a similar server.
DoSslowloris: The attackers utilize Slow Loris tools to execute a DoS attack.
SSH-Patator: The attackers utilize SSH Patator to try to execute brute force attacks to find the SSH login passwords.
FTP-Patator: The attackers utilize FTP Patator to try to execute brute force attacks to find the FTP login passwords.
DoSGoldenEye: The attackers utilize the GoldenEye tool to execute a DoS attack.
DDoS: The attackers utilize numerous machines which work jointly to attack one victim machine.
PortScan: The attackers attempt to collect data identified with the victim machine like the type of OS and running services through forwarding packets with different destination points.
DoS Hulk: The attackers utilize the HULK tool to complete DoS attacks on web servers which create volumes of different and jumbled traffic. In addition, the produced traffic could bypass caching engines and attack the server's immediate resource pool.

VI. RESULTS AND DISCUSSION
The accuracy of the model was evaluated in terms of the subset of the performance of model. Accuracy was one of the measurements for assessing the classification models. Equation (12) represents the accuracy estimation: Precision implies the positive predicative rate. It is a proportion of the total true positives the model states correlated with the total positives it demands. The rate of precision is presented in equation 13: The recall is known as the TP value, which refers to the total positives in the system states contrasted with the exact total of positives in the information. The rate of recall is presented in equation 14: The F1 score could likewise be used to estimate model performance. It is the weighted average of the recall and precision of the model. The value of the F1 Score presented in Eq. (15) is: The detection rate (DR) represents the level of intrusion instances. The value of the detection rate is presented in equation 16: TP: true positive, FP: false positive, FN: false negative, TN: true negative [13]- [16]. This paper combined the minority attack classes as having comparative behaviour and characteristics. Having combined comparable classes, the class of the predominant proportion of different attack labels seems to be enhanced. It can be seen from the table that the prevalence of the major class (Benign) was 83.34% where the minority class was 0.00039% (Heart bleed).
With such a major difference in prevalence values, the potential detectors might tend towards Benign. The Benign label was termed a normal label and the performance   analysis of the presented DBN was evaluated and correlated with other detection techniques, as shown in fig.4 according to table. 3.
The Bot label was termed as Botnet ARES, a new label. This label contained 1966 instances, with a prevalence of 0.06%. Compared to the other conventional and existing techniques, the proposed method achieved better performance results in terms of all parameters, as shown in Table.4. An accuracy of 97.93% and a detection rate of 98.51% was achieved in relation to this Botnet ARES label. Fig.5. represents the Performance Analysis for Botnet ARES Attack Detection.
The FTP-Patator and SSH-Patator labels were combined as Brute Force labels because both the FTP-Patator and SSH-Patator labels have similar characteristics and behaviour. By combining these labels, we formed a new label with 13,835 instances and 0.48% prevalence.
DoS/DDoS was new label representing combinations of DDoS, DoSGoldenEye, DoSHulk, DoSSlowhttptest, DoSslowloris, and Heartbleed as represented in table.6. By combining all these labels, 294,506 instances with 10.4% prevalence were performed via the proposed method and better results were obtained in relation to all the proposed VOLUME 8, 2020    parameters. Fig.7 represents the performance analysis for DoS/DDoS attack detection.
The performances in relation to the infiltration label and PortScan label were analyzed separately as shown in    tables 7 and 8. These labels were not equivalent with the characteristics and conduct of the other labels. The infiltration attack had 36 instances with a 0.001% prevalence ratio which was the lowest prevalence of the total instances. The performance analyses of both the labels are represented in figures 8 and 9.   The PortScan label had 158,930 instances with a 5.61%prevalence ratio with respect to the total instances. The proposed method on both the attacks performed better in all the parameters with an accuracy of 96.37 for infiltration attack and 97.71% for PortScan attack, and enhanced performance was evident for all the proposed parameters.
The Web Attack label included Web Attack-SQL Injection, Web Attack-Brute Force, and Web Attack-XSS, with 2,180 instances and a 0.07% prevalence ratio. Compared to the other attack labels, the proposed method achieved high performance results for all the proposed parameters. Figure.10 represents the performance analysis for Web attack detection. For the Normal attack labels the present technique accomplished 99.37% accuracy, and for the Web attack label the model accomplished 98.37% accuracy, as shown in table. 9, which are the highest performance levels obtained from this research.

VII. CONCLUSION
In this research, different types of attack and anomalies based on an intrusion detection system in the IoT were proposed and discussed. In evaluating the performance of the proposed deep learning model DBN-IDS system we used the CICIDS dataset for detection of attacks. Different attacks were presented in this dataset with many labels and numbers of attacks. In this paper we discussed the dataset in detail for the performance evaluation. DoS/DDoS, Botnet, Brute Force, Web Attack, Infiltration, and PortScan are types of attacks present in this dataset that could cause IoT system failure. The evaluation parameters utilized in the analysis were accuracy, recall, precision, detection rate, and F1-score. The proposed model obtained better results in terms of all parameters compared with the existing techniques. In future, the proposed IDS can be extended to detect other types of attacks against the IoT's systems, and various intrusion detection datasets. In addition, this proposed method can be used not only in intrusion detection, but also in classification and recognition.