Immune System Based Intrusion Detection System (IS-IDS): A Proposed

This paper explores the immunological model and implements it in the domain of intrusion detection on computer networks. The main objective of the paper is to monitor, log the network traffic and apply detection algorithms for detecting intrusions within the network. The proposed model mimics the natural Immune System (IS) by considering both of its layers, innate immune system and adaptive immune system respectively. The current work proposes Statistical Modeling based Anomaly Detection (SMAD) as the first layer of Intrusion Detection System (IDS). It works as the Innate Immune System (IIS) interface and captures the initial traffic of a network to find out the first-hand vulnerability. The second layer, Adaptive Immune-based Anomaly Detection (AIAD) has been considered for determining the features of the suspicious network packets for detection of anomaly. It imitates the adaptive immune system by taking into consideration the activation of the T-cells and the B-cells. It captures relevant features from header and payload portions for effective detection of intrusion. Experiments have been conducted on both the real-time network traffic and the standard datasets KDD99 and UNSW-NB15 for intrusion detection. The SMAD model yields as high as 96.04% true positive rate and around 97% true positive rate using real-time traffic and standard data sets. Highly suspicious traffic detected in the SMAD model is further tested for vulnerability in the AIAD model. Results show significant true positive rate, closer to almost 99% of accurately detecting the file-based and user-based anomalies for both the real-time traffic and standard data sets.


I. INTRODUCTION
Security of internet and intranet facilities are at continual risk due to the over reliance of government, military and commercial bodies on them for their day to day activities. Both the internet and intranet facilities face attacks from the network. An intrusion can be defined as an attempt to gain unauthorized access to network resources [1]. Intrusion detection system has become phenomenal in detecting attacks from both outside and inside network. An Intrusion Detection System (IDS) is a software or tool which can address attacks borne from either internet or intranet facilities. It can monitor, log the network traffic and apply detection algorithms for detecting intrusions within the network. An IDS can be categorized into two categories based on the type of model being used: signature-based IDS and anomaly-based IDS. In signature-based IDS, the pre-defined signatures of the attacks are stored in a database and the network is monitored against these existing signatures. In anomaly-based IDS, the network traffic is monitored and compared against the The associate editor coordinating the review of this manuscript and approving it for publication was Jiafeng Xie. normal usage patterns of the network. Any deviation from the normal usage patterns are considered as an intrusion attempt. Anomaly-based detection can detect new attacks whilst the signature-based detection cannot detect new attacks that are not pre-defined.
Anomaly-based detection was originally proposed by Denning [2] and since then it has gained immense popularity in detecting new attacks using various methods including bio-inspired approaches. Nature and natural organisms have always inspired researchers in the field of network security. Bio-inspired computing mimics the behavior of nature and natural organisms. Immune-inspired network security has become popular due to its marked resemblance with the natural Immune System (IS).

A. IMMUNE SYSTEM (IS)
The natural Immune System (IS) protects our body from harmful invasion of pathogens like virus, bacteria or parasites. The natural immune system is a multi-layered system with the innate immune system as the external and adaptive immune system as the internal layers respectively. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/

1) INNATE IMMUNE SYSTEM (SKIN)
Skin acts as the most important part of the innate immune system. It has natural Keratinocytes (KCs) that are involved in sensing pathogens and danger signals [3]. KCs form the central skin sentinels that recognize foreign bodies or pathogens. Pathogens form a microbial structure known as Pathogen Associated Molecular Patterns (PAMPs). KCs recognize these PAMPs using their own Toll-Like Receptors (TLRs) and produce inflammatory responses. These inflammatory responses generate threat signals to the immature dendritic cells (DC).
When an immature dendritic cell takes up a pathogen in infected tissue, it becomes mature activated dendritic cell. This pathogen specific mature dendritic cell has specific protein called MHC (Major Histocompatibilty Complex) on its surface. These secrete cytokines and chemokines that influence the activation of the naïve T-cells of adaptive immune responses. Naïve T-cells further produce armed effector T-cells on encountering the activated dendritic cell with pathogen specific protein called Protein: MHC complex.

2) ADAPTIVE IMMUNE SYSTEM
Naive T cells that recognize their antigen on the surface of a dendritic cell cease to migrate and embark on the steps that generate armed-effector cells. Armed-effector T cell recognizes the pathogen-specific MHC complex; it releases more effector molecules CD8 and CD4 that bind more strongly to the target pathogen cells. CD8 T cells are of two types (CD α and CD β) T cells that are responsible for killing the target cells most notably viruses, bound to MHC class I molecules at the cell surface. CD4 T cells express TH1 cells and TH2 cells, on the cell surface bound to MHC class II molecules. TH1 cells from CD4 activate macrophages, enabling them to destroy intracellular microorganisms more efficiently. TH2 cells, on the other hand, initiate B-cell responses by activating naive B cells to proliferate and secrete important antibodies. Therefore, TH2 cells are responsible for activating the B cells. Signals from the pathogen bound cells induce the B cell to proliferate and differentiate into a plasma cell secreting specific antibody. Table 1 shows these similarities with the proposed system.
In this research work, an unsupervised anomaly detection model has been devised where relevant features from header and payload potions are considered for effective detection of intrusion. The entire work has been segregated into five sections, where section I is introduction, section II contains related works from literature, section III illustrates the proposed model and section IV highlights experimental set-up and results. Finally, section V concludes the works which is followed by the references.

II. LITERATURE REVIEW
IDS form part of any complete security system and their goal is to detect any action that violates the security policy of a computer system. There are mainly two detection approaches that IDS can acquire namely, the misuse detection approach and anomaly detection approach. The misuse detection approach which is used in systems such as MADAM/ID [4] had utilized machine learning techniques to label the data. The classifier learns from a set of labeled connections, where there is normal traffic and attacks, and in subsequent use it recognizes known attacks. This detection approach has two problems. The first problem is to identify and obtain complete labeled network traffic and the second problem is to detect new attacks. Therefore, this method cannot solve the ''zeroday'' problem and as a result newer attacks would succeed in compromising the security of the system.
The second approach of handling intrusion is by anomaly detection, primarily proposed by Denning [2]. The main idea is to profile normal network traffic behavior and observe the deviation from the normal for the real-time traffic. The problem with this approach is to capture all the kinds of normal traffic. In order to model the normal traffic, this approach needs purely normal data. In a real-time system, it is very difficult to have either purely normal data or labeled data. If any attack is left as normal, then this attack will never be considered by the IDS and alert will not be generated.
Considering the problems of the two previous approaches, a third one is becoming more popular: the hybrid approach that is effective in taking care of the deficiencies of both the above approaches [5]. These kinds of systems do not need purely normal data and therefore unlabeled data can be used, which can be easily obtained.
Flood-based attacks can be detected by scanning the TCP/IP headers of network packets whereas the non-flood attacks cannot be detected using these headers [6]. Such attacks that are usually ''User to Root'' (U2R) and ''Remote to Local'' (R2L), have the intruders sending very few number of packets in order to malign the root directly or remotely in the network. For detecting such attacks, packet payloads can be used to detect intrusions.
Different approaches have been implemented to detect intrusions based on the payload of a packet. As the payload of different network depends upon the kind of service it provides therefore service specific approaches are developed. In paper [7], the authors Krügel et al. have taken into consideration R2L attacks based on service-specific knowledge to increase the detection rate of intrusions. In work [8], the authors have used byte frequency distribution of the application-level payload. In paper [9], the authors Z. Morley et al. have analyzed the DDoS attacks and suggested that the attack duration, packet count, packet rate and prototype are vital during packet feature selection. In [6], the authors have considered both the header and payload information to improve the detection. In papers [10], [11], appearance frequency of 255 ASCII codes have been considered to characterize the attack. Keisuke Kato et al. in their work [12] analyzed the DDoS attacks. According to their work, the bytes per second and packet size of normal and attack packets are found to be same (approximately) sizes. F. Iglesias et al. in their work [13] selected 16 features from 41 features of the DARPA set using multi-stage feature selection which includes both the header and data features from the dataset. Irshad M. Iqbal et al. in their paper [14] have also used data related features from the network traffic.
From the above literatures point of view, it can be understood that intrusion detection approach has evolved with time and become more requirement specific. Since then, several methods have been developed for detecting anomaly. Machine-learning, data mining, genetic algorithms, neural networks, and statistical methodology are some of them. Some methods namely genetic algorithms and neural networks have addressed the anomaly-based detection problem by mimicking the nature and natural organisms. Nature and natural processes have always led to new dimensions in problem-solving algorithms and devices and therefore have become important in the fields of engineering and applied sciences in the last few decades [15]. Nature and natural organisms have always inspired researchers in exploring and combating environmental issues and humans are no exceptions. Underlying this concept, bio-inspired computing has gained key role in the field of evolutionary computing, artificial immune systems, swarm systems and membrane computing systems. In recent times, deep learning approach for detecting intrusion has been applied. In [16] and [17], approaches based on recurrent neural network and deep convolutional neural network (CNNs) are used to capture the essential features. Also, nature-inspired swarm optimization technique has been used in paper [18].
Artificial Immune System (AIS) is one such domain where intrusion and its detection have been widely used for about two decades. AIS is believed to work similarly like its biological adaptation, Immune System (IS) by taking into consideration both the Innate Immune layer and the Adaptive Immune layer respectively. Bio-inspired intrusion detection model have been widely used by various researchers in the last decade. Forrest et al. have first and foremost the conceptualized the self-discrimination and non-self-discrimination in their paper [19]. This common approach for discrimination of IS was used for further to discriminate self (legitimate users, protected data) from non-self (unauthorized users, viruses, etc.). Hofmeyr in [20] has contributed towards the distributed approach that can be used for intrusion detection using the IS adaptation. Stibor et al. [21]- [23] have shown both the positive and negative selection methods of IS for appropriation of anomaly detection.
In recent times, Uwe Aicklein et al. in their paper [24] have used Dendritic Cell Algorithm (DCA) of Artificial Immune System (AIS) for intrusion detection. DCA is a population-based algorithm, inspired by functions of natural DCs of the innate immune system. In [25] and [26], the authors have used immune system and employed agentbased systems to detect the anomalies and intrusions based on network profiles. Walid Mohamed Alsharafi et al. in their paper [27] have generated large pool of detectors to find the correct match with any change probably intrusion, virus or misusing etc. Clonal Selection is another approach inspired by IS, that has been used in [28]- [31], where the emphasis is given on population of immune cells that are able to recognize the foreign bodies and thus are proliferated accordingly in the body. This approach works well as because it is dynamic and clone copies that have greater affinity towards recognition are only proliferated.

III. THE PROPOSED MODEL (IS-IDS)
IS-IDS mimics the immune system by imitating the functionalities of the KCs, DCs of innate immune system and T-cells, B -cells of the adaptive immune system. The current work gives emphasis on developing an IDS model inspired by the immune system. It consists of two layers. The first layer of the IDS imitates the innate immune system and termed as Statistical Modeling based Anomaly Detection (SMAD). Subsequently, the second layer, Adaptive Immune-based Anomaly Detection (AIAD) imitates the adaptive immune system.
The first layer SMAD has two modules. The Preprocessing-I Module works in analogous to KCs as it is responsible for sensing the external intrusions. It tries to capture the external traffic in order to find out the firsthand vulnerability of the traffic. If the traffic through the Pre-processing-I Module appears to be vulnerable, then it is considered for the pre-processing-II module or otherwise the traffic is a normal one and passed through the internal network of the organization.
The Pre-processing-II Module works more like the DC cells, which can initiate the immune responses by activating the naive T-cells for further adaptive responses. The Preprocessing-II Module identifies the traffic in order of vulnerabilities to the system, which can be considered as a) Normal b) Least Suspicious c) Moderately Suspicious and d) Highly Suspicious and forwards the most suspicious traffic to the adaptive immune system. This system is very similar to the skin, as it identifies the traffic according to the maliciousness and decreases the amount of traffic by a significant amount for the adaptive immune system.
The second layer AIAD takes into consideration the activation of the T-cell and the B-cell modules. The traffic that VOLUME 8, 2020 is found to be highly suspicious in the first layer is captured and the features from the header and payload portions relevant for the anomaly detection are disseminated to the T-cell Activation Module. This module further identifies the features from the traffic that are relevant for the User-based anomaly and File-Based anomaly Detection Modules. T-cell Activation Module acts like the armed-effector T-cell that binds strongly with the pathogen-specific MHC complex and releases proteins namely, CD8 and CD4 respectively. CD8 + T cells are proteins that bind further with MHC Class I molecules and release CDα and CDβ-cells. On a similar note, the features that are recognized by User-based anomaly Detection Module are further disseminated for the Off-Hours Access Detection and Remote Access Detection Modules.
The CD α-cell binds with the MHC I pathogen molecule in order to kill the pathogen further whereas the CD β-cells are specifically used for intravascular communication. In analogous to this, in the proposed system, the Off-Hours Access Detection Module would try to detect whether a user accesses the host-computer during off-hours of the system. And, the Remote-Access Detection Module tries to detect whether a user accesses the system in remote location or not. If the file finds any kind of remote access, then the user anomaly is recognized by the T-helper cell and sent to the B-cell Activation module.
On the other hand, CD4+ T cells are proteins that bind further with MHC Class II molecules to release TH1 and TH2 cells. Similarly, features that are recognized by the File-based anomaly Detection Module are further disseminated to the File Anomaly Type Detection and File Parsing and Detection Modules.
TH1 cell's main function is to activate macrophages to kill the intravascular pathogens presented by MHC Class II molecules. In analogous to this, File Anomaly Type Detection Module finds any anomaly related to file i.e. virus, worms etc. and deletes them immediately from the system. TH2 cells, or helper T cells, are responsible for activating naive B cells to make antibody. File Parsing and Detection Module further activates the File Identity Database for detecting any anomaly that was previously archived in the database.

A. STATISTICAL MODELLING BASED ANOMALY DETECTION (SMAD): THE INNATE IMMUNE SYSTEM
The proposed model mimics the innate immune system and is the first layer of the proposed IDS. It captures the characteristics features of a system after fixed interval of time and compares the observed value of F in ANOVA against the F table to identify intra-group and inter-group differences.
It captures the characteristics features say X 1 , X 2 , X 3 of a system with factor A at a levels, factor B at b levels and factor N at n levels. Factor A can be system-based resource usages with 1, 2, . . ., a levels and factor B can be user-based resources with 1, 2, . . . , b levels. The factors are independent treatment variables like system or user-based usages whose settings are controlled by the levels say the time interval. Table 2 represents the feature capture over time. The ANOVA model can be represented mathematically as: where, • µ is the overall mean response • α i is the effect due to the i th level of factor A • β j is the effect due to the j th level of factor B • γ ij is the effect due to any interaction between the i th level of A and the j th level of B • ∈ ij is the random error in the j th observation of the i th treatment ANOVA calculates Sum of Squares (SS) in order to determine the variability within the group or between the groups. The above can be mathematically expressed as ''equation (2),'' shown below: where, • SS(Total) is the sum of total of all observations • SS(A) is the sum of all observations due to factor A • SS(B) is the sum of all observations due to factor B • SS(AB) is the sum of all observations due to factor A and its interaction with B and • SSE is the random error ANOVA considers each characteristic feature i.e. X 1 , X 2 , X 3 as a group to evaluate the system behavior. If the mean of all the characteristics features are equal then the variation within a feature say, X 1 (T 1 ), X 1 (T 2 ) and X 1 (T 3 ) is taken into consideration.
If the means of all the features are not equal then the variation between the system features X 1 (T i ), X 2 (T i ) and X 3 (T i ) is taken under consideration.
The main intention is to test the null hypothesis (H 0 ) that considers whether the means taken from different system behavior are equal or accept the alternative hypothesis (H 1 ) that considers at least one of the system behavior means is different from others.
Thus, the main objective is to evaluate the variability either within the specific feature or between two or more features. Any variability would denote change in characteristic feature and would represent anomalous behavior. ANOVA uses F-tests to statistically test the equality of means using F-table. The F-table (F t ) tests the test-statistics (F r ) against the tabular value based on the chosen confidence level say 95% and the Degrees of Freedom (DF).
Let us assume time in seconds as time intervals, T 1 , T 2 , T 3 , T 4 seconds for n system or user-based resource features X 1 , X 2 , X 3 and X n . Following is the stepwise pseudo-algorithm for the SMAD model.

Algorithm 1 Algorithm for SMAD Model
Input: Accept the system-based or user-based resource features X 1 , X 2 , X 3 and X n for time intervals T 1 , T 2 , T 3 , T n in seconds Output:X 1 , X 2 , X 3 and X n vary significantly or non-significantly 1: Initialization: h ← number of rows k ←-number of columns N ← number of observations 2: Compute Total of the squares of all observations, (SS) as in ''equation (3),'': 3: Compute Correction Factor, (CF) as in ''equation (4),'': 4: Compute Total Sum of Squares (TotalSS), as in ''equation (5),'': 5: Compute Sum of Squares Total (SST), as in ''equation (6),'': 6: Compute Square Sum Between (SSB), as in ''equation (7),'': (7) 7 : Compute Sum of Square Error (SSE), as in ''equation (8),'': ii) DF for Sum of Squares Total, iii) DF for Square Sum between, 9: Compute Mean Square (MS) and F r against F t as shown below in ''equation (13), (14) and (15)'': iii) MSE = SSE/dfe (15) 10: Compute F ratio (F r ) as in ''equation (16) and (17) where, s is the standard error 2: Compute where, CD is the critical difference n is the sample size and t is based on df of dfe 3: Compute absolute as  16: If X n > Three_Sigma then 16 Once the highly suspicious network traffic is captured, it becomes necessary to determine the features of the packet that can be significant for detecting anomaly. The main issue is whether to capture the whole network packet or consider the TCP/IP header part only as considering whole packet including the data part would increase the efficiency of detection but at the same time increase the time and cost for capturing the whole network packet. However, considering only the header portion would reduce the detection rate and at the same reduce the time and cost for capturing the header.
Therefore, there are two options: either to capture the header information and discard the payload information or to capture header and, the payload of network packet and, subsequently devise a hybrid detection approach. This hybrid detection approach has some features from both the portions could be used for anomaly detection. T-cell_File_Activation Module has two sub-modules namely: File Anomaly Type Detection and File Parsing Detection Modules. Sub-modules are described as follows:

1) FILE ANOMALY TYPE DETECTION MODULE
This module is responsible to determine the integrity of the file once it is saved into the host-computer. This module determines the unique hash value of the file as it arrives into the host-computer and if the value matches with the already existing value in the File Identification Database then the file is stored otherwise it is deleted. Any new file or modification of an existing file can only be completed with permission from the system administrator.
For this file anomaly detection positive selection algorithm of T-cell has been taken into consideration. Using this positive selection algorithm, the hash values of the different files present in the system is generated and stored as set of antibodies, Ab = {ab 1 , ab 2 , . . . , ab n } of the set of files, F = {f 1 , f 2, . . . , f n } in the File Identity Database. When an incoming file, f i arrives its hash value is generated as antigen, Ag i . Then the value of the antigen, Ag i is compared with each of the antibodies, ab n Ab. If it matches to any of the antibodies, ab n , then the file, f i is read to the system. However, if it does not match any of the antibodies then the file, f i is not read to the system. Following shows the pseudo-algorithm for file anomaly type detection. For implementing the proposed model, negative selection algorithm is used. Negative selection algorithm is like negative selection (clonal deletion) of the thymus. This algorithm consists of two phases namely a) detector generator phase and b) monitor phase. a) Detector generator phase is responsible for generating set of detectors. Here, the detectors are the virus signatures that do not match any of the protected data of the files already existing in the system. The virus signatures or the detectors are the strings that are compared with each of the protected data of the files. The set of protected data of a file can be considered as a set of strings over a finite alphabet and any change in the data of the file would exhibit a change in the string not in the original set. To generate detectors, strings which do not match any of the strings of the protected data of the files are considered. A candidate detector is deleted if it matches any of the strings of the protected data of the files.
The negative selection algorithm was used for protecting files in the DOS operating system from corruption by viruses [16]. The self-set is obtained by taking into consideration the contents of the file, f that are already present in the File Identity Database. The contents of the file have strings of different length. Each such string can be considered asfeatures that are to be matched with the virus signatures or codes. In the detector phase, each of the virus signatures is the antigens, a gi that belongs to Ag where Ag is the file containing the virus signatures that are matched against each of the contents or the strings of the file. The strings are considered to be the features or the antibodies, A{ab 1 , ab 2 , a 3 , . . . , ab n } that belongs to f . Each such string is compared with the set of virus signatures, Ag. Each feature or the antibody in A are compared with each of the virus signatures in Ag. The similarity between them is referred to affinity (Dis) calculation in negative selection algorithm. If the affinity does not match then the antigen, ag n is accepted as an antibody and stored in the File Identity Database. Otherwise the antibody is rejected as ''Self''. The antibody is accepted as the detector. This is repeated until there are no antibodies left. Following shows the pseudo-algorithm for Detector Generator. Monitor phase is responsible for monitoring the protected data of the all the incoming files by comparing them with the detectors generated from the above phase. Each copy of the detector is run against the protected incoming file data. If any of the detectors matches the protected data of the files, then an alert is generated to the server machine. Following shows the algorithm for Monitor Phase. T-cell_User_Activation Module also has two sub-modules namely: Off-Hours Access Detection module and Remote-Access Detection module. Sub-modules are described as follows:

3) OFF-HOURS ACCESS DETECTION MODULE
This module tries to detect whether an intruder accesses the host-computer during off-hours of the system. According to Denning [2], individual profiles of the legitimate users can be useful in detecting such attacks. These profiles would provide the individual login frequencies of the legitimate user. This would help in detecting the masqueraders who try to log into unauthorized account during off-hours when the legitimate user is not expected to use the account. Off-hours or non-working hours may be assumed when the user is not expected to access system or his account. Individual profiles of legitimate users can provide the details VOLUME 8, 2020 of the user logged in during his/her office hours from the office location or aggregate of all locations the user accesses his/her account.

4) REMOTE-ACCESS DETECTION MODULE
This module aims to detect whether any external user accesses the system from remote location or not. If it finds any kind of remote access, then the user anomaly is recognized by the T-helper cell and sent to the B-cell Activation module. This module uses the basic idea that a user sends packets to a machine over the network in which he or she does not have the privilege to access like the local user would have the access. The external user tries to access the system in order to control the remote machine through the local user.
Both these modules attempt to find out any unauthorized access during off-hours or from remote locations. For the implementation of these modules, immune-inspired clonal selection classification algorithm (CSCA) is used. This is based on the idea that the cells (antibodies) that are capable of recognizing the foreign bodies (antigens) will only be selected for further proliferation. The cells undergo affinity maturation (mutation) which further improves their affinity towards the foreign bodies or antigens.
The interactions between antigens and antibodies (Ag-Ab) can be considered as a generalized shape (S) comprising of the set of features (F = {f 1 , f 2 , f 3 . . . , f n }) that characterizes them. This generalized shape (S) can represent both real and binary-valued features. A distance measure is used to calculate the degree of similarity (affinity) between them. Each of the antigen or antibody is considered to have same length, l. The length and cell representation depend upon the problem.
Following is the CSCA algorithm used for detecting anomalies in both these modules: The proposed model has considered both the real-time network traffic and the standard dataset for intrusion detection. For capturing real-time traffic, the experiment was conducted for three weeks on all incoming files. The system performance log is generated for each day, depicting the Network bandwidth usage, CPU usage, Memory usage respectively which is considered as the normal data set. The proposed model considers the system resource usages for generating the system performance log. The IDS monitors the usages of network bandwidth, CPU and RAM and captures them after every 5 minutes of time interval for all incoming files. The mean sample size for real-time traffic of all incoming files was 30,377. The maximum number of files that were considered on a particular day was 30,390. The mean population size was 2,20000 approximately. In order to verify the performance of the proposed system, unknown HTTP traffic files were also considered for the vulnerability test purposes. The mean sample size of such HTTP files was 3,000 per day approximately. An intruded data set has been created by disabling the graphics driver, audio driver and USB driver.

Algorithm 5 CSCA Algorithm for Off-Hours Access Detection and Remote Access Detection
Consider the following sample set of observations after every 5 minutes of time interval, T n (T 1 , T 2 , T 3 to T 4 ) for system resource features like network bandwidth, CPU and RAM usage respectively as in Table 3:   TABLE 3. Set of observations based on system resource usage.
The above usages depict system resource usages in percentage. The following Table 4 represents observations reduced by dividing a constant (say, here 10) from each observation. The network packet has been partitioned into header and the payload portion. The header portion of the packet considers the packet sequence_number, source_port, destina-tion_port, source_address, destination_address, length_of_the packet and protocol_type respectively whereas the data portion considers the raw data in binary bits representation.
The proposed model also has been tested on 10% KDD 99 standard dataset. The non-numerical attributes are converted to numeric values using discretization. Z-score normalization is also performed at the onset of the experiments to transform all attributes into the normalized format. The features that are selected for intrusion detection from the KDD 99 dataset are based on the type of attacks. Attacks in the dataset are categorized into Denial of Service (DoS), Probe, User to Root (U2R) and Remote to User (R2L) respectively.
Denial of Service attack takes place when an attacker makes the computing resources too busy to handle the legitimate users request and ultimately denies the user access to the system. The attacks that are categorized under this type in the KDD dataset are back, land, Neptune, smurf, pod and teardrop. Probe attacks are those where the attacker tries to gain information or sometimes abuse the host machine's features to look for exploits. Under this category, satan, nmap, portsweep, ipsweep are the four types of attacks.
For conducting the tests, DoS and Probe attacks have been considered for the SMAD model that has been proposed. These attacks are based on the assumptions that large number of packets is send to the host system over a short span of time [6]. These flood attacks are detected by scanning the TCP/IP headers of the network packets from the real-time traffic [6]. Features (f1-f9) have been observed from the 10% KDD dataset. The packets that are found to be highly suspicious are considered with their payload data for the second layer of the IDS. The payload-based features that are considered for these packets are content-based features (f10-f22) for file-based anomaly detection in File Anomaly Type Detection and File Parsing Detection Modules.
User to Root (U2R) attack occurs when an attacker tries to access legitimate user account or gain the root access. Loadmodule, buffer overflow, rootkit and perl are the four types of such attacks that an attacker tries to intrude through regular programming mistakes and environmental assumptions in U2R. For conducting tests on the standard dataset, these U2R attacks are used for the Off-Hours Access Detection module.
Remote to Location (R2L) attacks are those types of attacks where the attacker tries to gain access of the user machine remotely. The types of attacks in this category that are considered are phf, guess_passwd, warezmaster, imap, multihop, ftp_write, spy and warezclient respectively. For the purpose of experimentation, we have considered the R2L attacks for conducting tests of the Remote-Access Detection module.
The proposed model has also considered UNSW-NB15 dataset. It has 49 features. It is categorized in three groups: Basic, Content and Time. Some additional general-purpose and connection features are also present in the dataset. The features that are considered for the SMAD model are the basic features of the UNSW-NB15 dataset. The content-based features are considered for the file-based anomaly detection in File Anomaly Type Detection and File Parsing Detection Modules. The time-based and additional features are used in T-cell_User_Activation Module for user-based anomaly detections.

B. RESULTS AND DISCUSSION
The proposed model has been implemented in C# and the experiment was conducted for three weeks on all incoming files which were as high as 30,390 files on a particular day with mean sample size of 30,377 files each day. The system performance log is generated for each day, depicting the Network bandwidth usage, CPU usage, Memory usage respectively which is considered as the normal data set. An intruded data set has been created by disabling the graphics driver, audio driver and USB driver. Performance of the experiment has been measured by taking into consideration of the true positive rate against the false positive rate. The proposed system yields as high as 96.04% true positive rate that increased incrementally each day for real-time traffic using the first-layer, SMAD model. Similarly, the proposed SMAD model shows 7.8% as the false positive rate. The graph for True Positive Rate vs. True Negative Rate and ROC curve are given below in Figure 1 and Figure 2.  For standard KDD99 dataset, features (f1-f9) are considered for the SMAD model and the payload-based features that are considered are content-based features   The results of SMAD model shows 97.1of true positive rate and 2.79 of false positive rate. Figure 5 shows the results of SMAD model with increasing ROC.   For detection of remote location and off-hours office anomaly, we have considered both the content-based (f10-f22) and traffic-based features (f23-f41) of the standard dataset. The features that are found to be useful for R2L and U2R attacks are hot, num_failed_logins, logged_in, num_compromised, root_shell, su_attempetd, num_root, num_file_creations, num_shells, num_access_files, num_ outbound_cmds, is_guest_login, is_host_login, count, srv_count, dst_host_count, dst_host_srv_count, dst_host_ diff_srv_rate, dst_host_srv_serror_rate, dst_host_srv _rerror_rate. Figures 6 and 7 show the results for both these modules.
UNSW-NB15 dataset has also been considered in order to meet the current demands of network security. The UNSW-NB15 dataset has 49 features in total which are categorized in three groups: Basic, Content and Time. Some additional general-purpose and connection features are also present in the dataset. The features that are considered for the SMAD model are the basic features of the UNSW-NB15 dataset.
The features that are found to be relevant are for the SMAD model are spkts, dpkts, dloss, dbytes, sloss, sttl, state, service, rate, sbytes, sloss, dload, dttl, dur. The result of the SMAD model is represented in Figure 8.    Figure 10.

V. CONCLUSION
The current work is a proposal on intrusion detection system based on artificial immune system. It has two layers as on a natural immune system. Statistical Modeling based Anomaly Detection (SMAD) is the first layer of IDS. The SMAD works as the Innate Immune System (IIS) interface. It can be considered analogous to the skin of innate immune system. Its primary intention is to capture the initial network traffic in order to find out the first-hand vulnerability of the traffic. It captures the characteristics features of a network after fixed interval of time and compares the observed value against the pre-defined value of F in ANOVA. The SMAD captures the network traffic that is found to be highly suspicious. The second layer is based on the adaptive immune system and has considered analyzing features of the network packets of the highly suspicious traffic. It is termed as Adaptive Immune-based Anomaly Detection (AIAD) and acts like the second layer of the IDS. It imitates the adaptive immune system by taking into consideration the activation of the T-cells and the B-cells. It captures relevant features from header and payload potions for effective detection of intrusion.
The proposed model has considered both the real-time network traffic and the standard dataset for intrusion detection. For capturing real-time traffic, the experiment was conducted for three weeks on all incoming files. In real-time traffic, the source port, destination port, source IP address, destination IP address, packet length, payload data have been considered from the network packet. The proposed system yields as high as 96.04% true positive rate that increased incrementally each day for real-time traffic using the first layer, SMAD model. Similarly, the proposed SMAD model shows 7.8% as the false positive rate. The proposed system yields 97.1% true positive rate and 2.79% false positive rate for KDD 99 dataset. Highly suspicious traffic from the SMAD model are further tested for vulnerability in AIAD model. Results show significant true positive rate which is closer to almost 99% of accurately detecting the file-based and user-based anomalies. The proposed model also has been tested on 10% KDD 99 standard dataset. For standard KDD99 dataset, features (f1-f9) are considered for the SMAD model and the payload-based features that are considered are content-based features (f10-f22) for file-based anomaly detection. For detection of remote location and off-hours office detection, we have considered both the content-based (f10-f22) and traffic-based features (f23-f41) of the standard dataset. Results show high true positive rates and lower false positive rates for this standard dataset.The proposed system has been tested on UNSW-NB15 dataset that considers recent anomalous behaviors of current network security scenarios. For this dataset, the basic features were considered for the SMAD model whereas the content, time and additional features were considered for the AIAD model. Results exhibit very high true positive rates and ROC area for the SMAD model. The AIAD model also scores high true positive and lower false positive rates respectively.
INADYUTI DUTT has been in the field of academics, industry, and research for more than 18 years. She is currently doing the research in network security with the Department of Computer Applications, Sikkim Manipal Institute of Technology, Sikkim Manipal University (SMU). She is currently working as an Assistant Professor with the Department of Computer Applications, B. P. Poddar Institute of Management and Technology, Kolkata, India. She has more than 30 publications, authored a book and few book chapters to her laurels and also has keen research interest in the field of data mining, neural networks, and machine learning. She has been associated with ACM (CSTA), IAENG, and IACSIT. She has also been a Technical Reviewer in various journal and conferences like IC3 2018, Informatica, IJECE, and IJBM.
SAMARJEET BORAH is currently working as a Professor with the Department of Computer Applications, Sikkim Manipal University (SMU), Sikkim, India. He handles various academics, research, and administrative activities. He is also involved in curriculum development activities, board of studies, doctoral research committee, and IT infrastructure management. along with various administrative activities under SMU. He is involved with various funded projects in the capacity of a Principal Investigator/Co-Principal Investigator. He has authored two books and contributor of chapter of two books and has more than 50 research publications in international journals and conference proceedings to his credit. His areas of specialization are medical image processing and CAD, data structure and analysis of algorithm, object oriented programming, programming language, bioinspired network security, and computer organization and architecture. His areas of research are biomedical image processing and CAD. He was a recipient of the Best Poster Award from the 96th Indian Science Congress, 2009.