Research Trends in Network-Based Intrusion Detection Systems: A Review

Network threats and hazards are evolving at a high-speed rate in recent years. Many mechanisms (such as firewalls, anti-virus, anti-malware, and spam filters) are being used as security tools to protect networks. An intrusion detection system (IDS) is also an effective and powerful network security system to detect unauthorized and abnormal network traffic flow. This article presents a review of the research trends in network-based intrusion detection systems (NIDS), their approaches, and the most common datasets used to evaluate IDS Models. The analysis presented in this paper is based on the number of citations acquired by an article published, the total count of articles published related to intrusion detection in a year, and most cited research articles related to the intrusion detection system in journals and conferences separately. Based on the published articles in the intrusion detection field for the last 15 years, this article also discusses the state-of-the-arts of NIDS, commonly used NIDS, citation-based analysis of benchmark datasets, and NIDS techniques used for intrusion detection. A citation and publication-based comparative analysis to quantify the popularity of various approaches are also presented in this paper. The study in this article may be helpful to the novices and researchers interested in evaluating research trends in NIDS and their related applications.


I. INTRODUCTION
T ODAY'S era is of information and communication, and the numbers of host/terminal are continuously increasing in the scenario of computer networking. Vulnerabilities in security systems and unauthorized access to information systems are also growing tremendously. Many techniques, namely firewalls, access control, anti-virus, anti-malware software, application security, behavioral analytic, data loss prevention, distributed denial of service (DDoS) prevention, and network segmentation are commonly used in the computer world to promote internet security mechanisms due to their capabilities of content filtering, blocking data outflow, and alerting and preventing malicious activities. Firewalls and spam filters are generally used with simple rules-based algorithms to allow and denial of the protocols, port, or IP addresses. But the drawback of these firewalls and filters is that sometimes they are unable to control complex attacks of DoS (denial of service) types, and they are also not capable of making the differences between 'good traffic' and 'bad traffic'. An intrusion detection system with anti-virus has a significant impact on computer network security mechanisms that provides a more prominent scenario for protecting a computer network from the unauthenticated access control service. In the perspective of information systems, intrusion refers to any attempt that compromises the integrity, availability, confidentiality, or bypasses the security mechanism in a computer or a network [1].
According to the National Institute of Standard and Technology (NIST), intrusion detection is the process of monitoring events occurring in a computer system or a network and analyze these events for a sign of intrusions. The monitoring processes can be accomplished with software or hardware to secure the system from malicious activity or escape integrity policies that are being violated. The IDS performs the intrusion detection process to secure a computer or network. It provides a more prominent scenario for protecting a computer network from the unauthenticated access control service.
The IDS can be categorized into three categories on the installation basis in the system-Host-based IDS (HIDS), Network-based IDS (NIDS), and Hybrid IDS. HIDS are deployed on a single host. In HIDS, attacks are detected from a single computer system, and the essential files of the operating system are analyzed. Hence, these types of attacks are usually easy to detect except for some in-filtered malware which is very hard to detect. In a NIDS, malicious information is detected from the diverse interconnection of computers, and NIDS are deployed on routers or switches in a network. Whereas, hybrid IDS can be deployed on hosts as well as on the network. The primary goal of NIDS is to identify malicious or threatened logging information and report to the network manager about this malicious information. An intrusion detection system usually does not prevent the system from intrusion attack; rather than it merely generates an alarm after detecting an attack in the system in real-time or before the arrival of the attack on the target. It is also equally vital to cause notice of an attack after the happening of that attack in the system because an IDS maintains and updates an intrusion profile in the log. The operating system must also uphold various activities that require excess disc space and central processing unit (CPU) resources for analysis of logs. Managing the logs formats and comparing these formats with identified attack patterns according to security violation issues is also a big challenge in the IDS [2].
The literature regarding intrusion detection systems for a network does not provide the research trend, popularity of the datasets used evaluation of a NIDS model, and the popularity of different intrusion detection approaches. The research articles taken for research review are usually not based on any qualitative measure. They are chosen arbitrarily. But we took citation as a measure to quantify the popularity and research trend in research articles.
Citation is a measure to identify the popularity of a research article. According to Linda in [3], citation is the number of times that the other authors mention a research article in his/her work/s. The citation of an article is a quantitative and qualitative measure to recognize the popularity of a research article and an institution. Citation also provides the trend of research in a specific field.
We want to determine the research trend and popularity in intrusion detection based on various approaches and methodologies. Citation belonging to a published article is used to explore the research trend regarding that article. Citation is a valuable and popular scale to measure the research trend in a field [3]. There are various search engines like Google Scholar, Web Science, Microsoft Academic, Semantic Scholar that record citations of an article. Citation and article publication count regarding an article on a research topic is considered a research trend and popularity related to that particular research topic. Citation to an article (say A) is the total number of references of an article (A) that the other articles include this references to A into their work. But no search engine or database provides research trends on a search field/topic along with the different used approaches. The current article focuses on research trends in the field of IDS, its related techniques, datasets, total publications, and other citation-related analysis. We considered the related articles to IDS from 2005 to 2020. The major contribution of this article is outlined as follows 1) To compare several popular IDS, which are being used commercially for network security. 2) To analyze the popularity of various datasets to evaluate a NIDS. 3) To find out the popularity of different approaches and methodologies used in the intrusion detection system. 4) To analyze most cited articles published in journals and conferences in a fast and easy manner. 5) To analyze various performance metrics used to evaluate an intrusion detection system In the past decades, academic search engines and bibliographic databases (ASEBDs) comparison has been widely investigated [4], [5]. A comparative analysis of various academic databases and search engines have also been shown in [6] and [7]. Microsoft Academic Search (MAS) follows semantic search in which the search engine does not only match the keywords to content; instead focuses on their meaning with a broader scope and coverage as compared to Web of Science, Scopus and Google Scholar [4], [5], [8]. It helps searchers by providing some entries and interesting topics when they are unsure about searching string. MAS also supports searching based on journals, conferences, institutions, and authors in different fields for finding the best search results. The total number of publications records by Microsoft Academic in [9] is 247,389,875, 261,445,825 authors, 743,427 topics, 4,523 Conferences, 48,974 Journals, and 25,811 Institutions. The total number of estimated citation pairs is 2,390,820,943. In the same vein, a total of 36,765 publications with 592,675 citations are observed in the IDS field. So, Microsoft Academic [9] has been used for taking the records of citations that use a search string to achieve the goal of this paper.
The rest of the paper is organized accordingly. Section II deals with related work in the field of research trends in intrusion detection. Section III discusses the method applied for finding the citation and related articles and other corresponding reviews. Network intrusion detection system, its modules, widespread causes of intrusion in a network, some popular NIDS, and their analysis presented in Section IV. Section V shows different benchmark datasets and their citation analysis-based records. A study related to various methodologies used for intrusion detection in a network is given in Section VI. Section VII is regarding the performance metrics used to evaluate a network intrusion detection model. Section VIII explores the discussion on the result of our present study. Finally, Section IX presents the conclusion.
For clarity, we explain some abbreviations and their corresponding acronym commonly used in this paper. In the KDD'99 dataset, KDD stands for Knowledge Discovery in Database, 1999. NSL in NSL-KDD dataset stands for Network Security Laboratory. ISCX is the acronym for Information Security Centre of Excellence which is one of the leading institutions in the area of information and communication security, in collaboration with the Atlantic Canada Oppor-tunities Agency (ACOA). CIDDS is abbreviated for Coburg Intrusion Detection Data Sets. UNSW-NB15 stands for the University of New South Wales Network-Based dataset, 2015. SSENet stands for self-supervised scale equivalent network Dataset. KNN means K-Nearest Neighbors which is a supervised learning algorithm. SVM stands for support vector machine which is also supervised learning that is used for classification as well as regression problems. PCA is denoted for principal component analysis.

II. RELATED WORK
During the last decade, several surveys of intrusion detection have been conducted. One of the earliest was presented by Matt Bishop in [10] about trends in vulnerabilities analysis and intrusion detection. According to Matt Bishop, trends in intrusion detection are infrastructure-based protocols and techniques required to design and develop intrusion detection systems.
Another popular survey by Kabiri & Ghorbani in [11] presented trends in Intrusion Detection Systems (IDS) and also analyzed some problems regarding intrusion detection. A traditional IDS faces challenges like, time consumption, log-file updating, statistical and rule-based analysis, and accuracy. The survey presented in this article is based on intrusion detection along with AI, embedded programming, agent-based IDS, and software engineering.
Zamani & Movahedi in [12] presented a review article based on some influential algorithms based on machine learning approaches used in intrusion detection. Zamani explored that using a machine learning approach for intrusion detection enables a high detection rate and low false-positive rate with the capabilities of quick adaptation toward changing intrusive behavior. The analyzed algorithms in this review paper have been categorized into artificial intelligence and computational intelligence bases.
Agrawal & Agrawal in [13] surveyed various data mining techniques for intrusion detection. Various machine learning techniques, individually or in hybrid form have been widely used not only in the field of clustering or classification but also for reducing the dimensionality and feature selection of IDS.
Ahmed et al. in [14] presented the challenges regarding the datasets which are being used for IDS Model. This survey was based on the categories of IDS namely; classification, statistical, information theory, and clustering.
In the present scenario, the statistical method extended with new methods based on bioinspired approaches. These methods are mainly based on the evolutionary theory or swarm intelligence method [15]. For finding the suitable and best-fit selection of bio-inspired algorithms, various characteristics like Convergence, Intensification, diversification, CPU time, etc. are to be analyzed.
Audrey A. Gendreau in [16] represented a survey of Intrusion Detection Systems towards an End to End Secure Internet of Things (IoT) and this survey of the Intrusion Detection Systems (IDS) use the most recent ideas and methods to propose the present IoT. To understand and illustrate IDS platform differences and the current research trend towards a universal, cross-platform distributed approach has been taken into consideration.
Hamid et al. in [17] provided a review of the benchmark datasets available for researchers in the field of intrusion detection that are used to train and test their models. The review on various datasets namely; DARPA 98, KDD'99, NSL-KDD, UNM-Dataset, UNSW-NW15, Caida Distributed denial of Service (Caida DDoS) Dataset, Australian Defense Force Academy Window Dataset (ADFA-WD), provided the details of classes, attributes, and instances.
Most recently, Misra et al. in [18] also proposed a detailed investigation and analysis using machine learning approaches for intrusion detection. This survey depends on the categorization of the classifiers into four categories viz-a-viz single classifiers with all features in the dataset, the single classifier with selected features of the dataset, multiple classifiers with all features of the dataset, multiple classifiers with selected features of the dataset. This analysis also reveals that a well-performing intrusion detection approach for one type of attack, may not be a well-perform for the other types of attacks.
All the literature discussed so far, does not focus on the research trend and popularity in NIDS based on some quantitative measure bases. However, in this article, we analyze various commercially used IDS, the popularity of various benchmark datasets, and the recent trends in the used approaches in intrusion detection. The analysis performed in the article is based on quantitative measures instead of qualitative measures.

III. METHODS
Researchers are more attracted to articles that have a high citation. So, we have taken citations as metrics that provide a standard and validity of a research topic/journal publications in a research area. The string-based searching in Section III took research articles from the year 2005 to 2020.

IV. NETWORK INTRUSION DETECTION
The concern about increasing security problems has been expressed by James P. Anderson in a paper [19], published in 1972. After that, in 1980, he outlined an audit base procedure for automated intrusion detection and monitoring processes for hosts [20]. From 1980 to 1990, the US government invested funds for many projects like network audit director and intrusion response (NADIR), Haystack, Multics intrusion detection and alerting system (MIDAS), and Discovery, etc. [21].
Zuech et al. in [22] explored that a NIDS helps the forensic process to identify the footprint of breaches. Attacks travel from one computer/node to another through routers and switches, and a NIDS observers network traffic data at the network layer. Based on pattern-matching of this network traffic data, the NIDS can be further categorized into Anomaly (Unknown)-Based or Misuse (Known) Based IDS. In anomaly detection, pattern base examination of traffic flow is implemented and deviation from normal pattern behavior leads to the inference of intrusive information. On the other hand, parametric examination of features and known signature for an attack is used to compare with a predefined set of rules for the detection of unauthorized action in misuse detection.
A year-wise analysis of the articles published regarding intrusion detection is shown graphically in Figure 1. Figure 1 depends on the number of research publications on intrusion detection systems from 1972 to 20th December 2020. It has been noticed that in the last three-decade, intrusion detectionrelated publications and research-related articles are continuously growing after the year 1998 with minor crest and troughs.

FIGURE 1. Research publications on intrusion detection systems since year 1972 to 2020
A NIDS comprises different modules that are shown in Figure 2. These modules perform the detection process for intrusive information in a network. The three modules that comprise a NIDS with their function are shown in Figure 2. The detection machines module helps to detect intrusion or anomalies. The detection software performs detection strategies, and the management machine manages the detection strategies or policies. The other sub-modules of the detection machine module is the data capture module. The intrusion detection module and communication modules capture packets from the network. The second module of a traditional NIDS, Management Machine, is used for managing and maintaining detection policies based on detection strategies. The database is the third module that maintains and stores recorded behavior of intrusion detection based on feature extraction. The most common issues faced by a NIDS are fidelity problems, resource usage, and reliability. Existing intrusion detection systems suffer from at least two of the problems defined by Hoque et al. in [23]. The various phases of the network intrusion detection model (NIDM) are shown graphically in Figure 2.

FIGURE 2. A Network-based intrusion detection system (NIDS) with its components
In Figure 2, Mgmt Cmd stands for management command, Ctrl Resp stands for control response, Ctrl Cmd represents control command, and Policy Info is abbreviated for policy information.

A. CAUSES OF INTRUSION IN NETWORK
Based on Anchugam & Thangadurai in [24] and Ghorbani et al. in [25], we observed some commonly occurred causes of intrusion in a network. These are as follows.
a. Bad packets (produced from corrupt domain name system (DNS) data, software bugs) and local packets may not be detected significantly, which causes high false-alarm rates (false +ve). b. The encrypted packets may cause intrusion, which is not preventive without effective IDS. c. IDS may not effectively imply the identification and authentication for weak access in the network. When an attacker gains admittance due to a soft authentication mechanism, then IDS is preventive for the misconduct. d. NIDS systems can be subject to some protocol-based attacks, then hosts in that network may be vulnerable to illegal data, and Transmission Control Proto-col/Internet Protocol (TCP/IP) stack attacks may be the reason for the crash of an NIDS.

B. COMPARISONS OF SOME POPULAR NIDS
There are many NIDS that are used commercially for network security purposes. Some popular NIDSs, in Table 2, are tabulated with their comparative analysis.

V. BENCHMARK DATASETS USED IN NIDS MODELS
Various datasets as benchmark datasets have been used to evaluate the intrusion detection model. The work done on the various datasets is to exhibit better classification accuracy, and detection rate [29]. There are many intrusion detection datasets published over the last few years. Finding a relevant dataset to evaluate an intrusion detection model is a tough job. Ring et al. in [30] explored a survey on existing datasets for network-based intrusion detection along with an analysis of their properties, attack scenarios, and relations between the datasets.
Here, regarding the popularity of various datasets, statistical comparison-based citations along with advantages and disadvantages of different benchmark datasets are tabulated in Table 3. Table 3 shows that the KDD Cup'99 dataset has the highest citation as a benchmark dataset since 2005. It means that the highest work has been done using the KDD Cup'99 as a benchmark dataset compared to the other datasets. The second most cited dataset is NSL-KDD, according to Table 3.
A newer dataset containing more modern attacks, such as the UNSW-NB15 dataset generated for the Australian Centre for Cyber Security [40] is also used as a benchmark dataset. This dataset comprises nine sorts of attacks and has a training set with one hundred seventy-five thousand records and a testing set with eighty-two thousand records. The hypertext transfer protocol (HTTP)-based dataset was generated for the CSIC (Consejo Superior de Investigaciones Científicas (Superior Council of Scientific Investigations)), Spanish Research National Council, in 2010 to report the criticisms of KDD'99 [25]. The dataset contains thirty-six thousand of which are 'normal' requests and more than twenty-five thousand anomalous. These datasets may be more applicable for specific cases; however, they are not as ubiquitous as KDD Cup'99 and NSL-KDD datasets. For demonstration of the benchmark datasets, KDD Cup'99 and NSL-KDD are ideal datasets since many papers describe their implementations specifically [41], [42] [43].
The year-wise distribution of various datasets is presented graphically in Figure 3. This graph shows that the KDD Cup'99 dataset has the highest popularity, followed by the NSL-KDD dataset from 2005 to 2018. Meanwhile, other datasets also came into existence.

VI. APPROACHES USED IN NIDS MODELS
Classical intrusion detection problem-solving methodologies, according to Liu & Lang in [2] and Jyothsna et al. in [44], introduced four branches based on methodologies:  statistical-based, knowledge-based, machine learning-based, and bioinspired-based along with their used approaches. Ta-ble 4 presents the total number of publications, the highest citation in conferences and journals related to intrusion detection, along with the methodologies. Here, we considered research articles from the year 2005 to 2020. There are many optimization approaches for finding the optimal rating of intrusion detection. Table 5 to Table 8 represent different optimization approaches, their corresponding methodologies, citation, and published articles records for intrusion detection models.
Intrusion detection with machine learning approaches, which have the highest publications, is shown graphically in Figure 4. Comparison of citations related to the articles published in conferences and journals among various methodologies, viz-a-viz statistical-based, knowledge-based, and bioinspired-based is shown graphically in Figure 4. Figure 4 also depicts the most cited articles published in   16 Balasaraswathi et al. [29] conferences those attained a high count than the articles published in journals.

A. STATISTICAL-BASED NIDS
A statistical-based intrusion detection system (SBIDS) [52] use statistical observation on different variables like the login session, resource overflow flags, and timers. The statistical properties like mean, standard deviation, correlation, Analy-VOLUME 4, 2016 sis of Variance (ANOVA), and statistical tests determine the deviation from the 'normal' behavior of network traffic flow [53]. Articles related to intrusion detection with time-series statistical approach have the highest publication count with the highest citation values than the other statistical approaches, as shown in Figure 5. Table 5 also enlightened the highest cited articles of journals and conferences with citation counts on intrusion detection among different statistical approaches.  Year-wise publications and citation distribution comparisons of articles among various statistical approaches are displayed graphically in Figure 6 and Figure 7 respectively. Articles regarding IDS based on the time-series model represent the highest citations and publications values than the other statistical approaches, shown in Figure 5.
Here, two sharp points are observed. One is that time series-based article publications are highest in counting than the other statistical approaches. Second, the number of publications has grown from 5 publications in 2005 to 36 publications in 2019. A year-wise publication analysis among different statistical approaches and articles based on time series model-based intrusion detection also showed the highest publication in 2019 with 36 publications. Figure 7 shows a year-wise citation count for statistical approaches. The year 2013 has the highest citation score, but citations among other years remain almost the same. It

B. KNOWLEDGE-BASED NIDS
Knowledge-based IDS (KBIDS) congregate intrusive information about networks and produces less false alarm rate with high accuracy in intrusion detection. But KBIDS requires up to Date knowledge repository about network traffic behavior [53] All knowledge-based techniques with their total number of publications and citations are tabulated in Table 6. The highest cited article based on the expert system is by authors Patcha & Park in [59] with the count 1695. Figure 8 depicts that expert system-based publication count and citation have a higher value than finite state machine (FSM) and descriptive language. As shown in Figure 9    The most cited article based on a knowledge-based approach for intrusion detection is [59], written by Patcha & Park with 1695 citations. The work in [59] explores the use of finite state machines as a knowledge-based approach. Figure 10 represents the year-wise citation distribution of knowledge-based research articles in the intrusion detection field. Based on the three curves in Figure 8, expert systembased articles have higher citations from 2005 to 2020. It means that the research trend in the expert system approach of knowledge-based intrusion detection is higher than the other knowledge-based methodologies. Figure 10 shows that before a decade, expert-system-

C. MACHINE LEARNING-BASED NIDS
Traditionally, NIDSs are designed based on highdimensional network traffic classification into normal or intrusive data. Due to the high dimensionality of network traffic data, intrusive information detection is significantly slower in traditional NIDS. Such traditional NIDSs with a machine learning approach on selected features take comparatively low FPR (false-positive rate) with a high TPR (true positive rate) for predicting the traffic behavior of network [65]. Machine learning-based classifier models trained and fit over on the training sets among selected 'important' features. The 'important' and relevant feature subsets are selected based on which machine learning-based classifier gets trained. Training sets consist of respond classes over which the classifier gets trained and fit over to recolonize network traffic data behavior/classes. In Table 7, machine learning-based publication and citation counts of articles for IDS are tabulated. According to this table, the SVM is the utmost interested (cited) technique for intrusion detection researchers. The neural network, followed by the decision tree, is also an exciting machine learningbased intrusion detection technique. Table 7 also depicts the total number of publications and citation counts of articles and the most referred articles published in the conferences or the journals.
Articles based on SVM for intrusion detection systems have the highest cited topic in the research. Even though published articles are higher on neural-network-based intrusion detection than the SVM and fuzzy logic, as shown in Figure 11. In contrast, Gao et al. in [33] explored the drawbacks of the SVM algorithm, which consumes a long time and without gain of accuracy. The Adaboost-based model is not ideal whereas, the precision obtained on implementing logistic regression algorithm is not high for intrusion detection. VOLUME 4, 2016 Figure 12 depicts that a neural network with intrusion detection is the prime choice for authors with 2152 publications from 2005 to 2020. The second most popular technique among authors is SVM, with 1716 publications for intrusion detection after the neural network. Based on 757 publications, the third rank is observed for the decision tree with intrusion detection.
On the other side, the total citation count of published articles is also recorded based on the different machinelearning practices among IDS. Figure 13 represents this citation analysis. The total citation count regarding neural networks is 21705, which is less than the citation count FIGURE 12. Publication analysis among several machine-learning approaches applied in IDS among SVM. SVM has a citation count of 27329. Moreover, SVM has the highest citation count rank, the neural network has the second rank, and the decision tree has the third rank followed by fuzzy logic.
It means that SVM is the most favorite subject among researchers and academicians regarding intrusion detection references.

D. BIO-INSPIRED-BASED NIDS
Bio-inspired are popular approaches used for optimization and problem-solving. The requirement for enhancing accu- The total published article count, total citation count, most cited/ referred articles concerned with bio-inspired approaches are tabulated in Table 8. Table 8 along Figure 14 represent the comparison of total article publication and citation count. Genetic programming has the highest published article count with a value of 899 and spikes a citation count with the value 9240. Figure 14 represents a year-wise distribution of the total number of publications and total citation counts, yearly. Genetic algorithm-based articles are highest published in the category of evolution-based intrusion detection. On the other hand, articles published based on ACO for intrusion detection have the highest value in the swarm-based algorithm category. Published articles count in the ecology-based category has a nominal value with 4 numbers.  Figure 16 and Figure 17 represent a year-wise distribution of published articles count, while Figure 18, Figure 19 and Figure 20 represent the year-wise citation counts for the different bio-inspired approaches used for intrusion detection systems. According to Figure 15 and Figure 18, year-wise published articles count and citation count for genetic algorithm along intrusion detection system have a high 2 Na et al. [113] distribution in evolution-based category. Similarly, according to Figure 16 and Figure 19, the published articles count and citations for ACO with intrusion detection have the highest year-wise distribution in the swarm-based category. Hence, ACO in the swarm-based category has the highest research trend in intrusion detection. The genetic algorithmbased IDS has high published article distribution and a high research trend in the evaluation category.
Similarly, ACO in the swarm-based category has the highest research trend for intrusion detection. The genetic algorithm-based IDS has high published article distribution and a high research trend in the evaluation category.

E. COMPARISON OF THE MOST CITED APPROACHES USED IN NIDS
Based on the publication count and citation of articles, Table 9 presents a comparison among the most popular method-VOLUME 4, 2016    In a time series statistical-based intrusion detection system, a series of events are observed within the interval of time. If a new event falls within a specific time, the possibility of being normal is high. Otherwise, the possibility for an event of being normal is very low [114]. Expert systems (ES) are rule-based approaches used in KBIDS, comprising rules, facts, and inference methods. Each event is first converted into related facts and rules in an IDS system, and then some inference rule is applied to generate prediction [53]. SVM uses a hyper-plan for differentiating the response classes of the dataset. Genetic algorithm (GA) is an evolutionary algorithm-based approach in which optimization is based on mutation [52]. GA encodes a set of solutions to form a population, and GA evolute this population based on fitness function [65].

VII. PERFORMANCE MEASUREMENTS
The performance of network security can be calculated based on efficiency and effectiveness. Efficiency deals with the resources needed to be allocated to the system, including CPU cycles and main memory. In comparison, effectiveness describes the ability of the system to distinguish between intrusive and non-intrusive activities. In the context of IDS evaluation, researchers generally use metrics to measure the effectiveness quantitatively based on the training and testing of the classifier using benchmark datasets. These metrics measure how well the attack instances are detected against normal instances. The confusion matrix and the receiver operating characteristic (ROC ) curve are mainly used to calculate the effectiveness of the IDS. The total publication count for the confusion matrix and ROC curve is 2045 and 54, respectively. On the other side, the citation count for the confusion matrix and ROC is 75349 and 711, respectively. The searching strings as per Table 1 with the filters as described in the Section III are used for the selection of publication and citation records. Figure 21 shows a comparison of citation and publication counts regarding confusion and ROC in the intrusion detection field. This figure depicts that the confusion matrix is more popular and having high research trends in intrusion detection for evaluating IDS models.

A. CONFUSION MATRIX
The confusion matrix is an easy and effective way to characterize the classification results of an IDS. The equations of metrics, as shown in (1) to (8), are based on the fundamental measuring parameters of the confusion matrix, as shown in The performance of the NIDM with data mining classifier is measured based on the following metrics discussed by Almomani in [65], and Ferrag et al. in [115] also. M CR = F P +F N T P +T N +F P +F N 2. Accuracy: It gives the total number of correct classifications, i.e., how often is the classifier correct.
3. True Positive Rate: It is also known as Recall or Sensitivity. It gives the total number of correct classifications regarding incorrect classification.
T rueP ositiveRate = T P T P +F N 4. Specificity: Specificity is also known as the true negative rate (TNR). It represents how properly a classifier identifies true negatives. It gives the number of intrusive classifications regarding the total number of intrusive data (i.e., T N + F P ) during training.
P rec = T P T P +F P 6. False Positive Rate (FPR): When it is actually attacked, how often does it predict normal 7. Prevalence: Prevalence tells that how often does the yes condition actually occurs in our sample.
P revalence = Actual_N ormal T P +T N +F P +F N 8. F-Score: It serves as a derived effectiveness measurement.
F − Score = 2 * T P 2 * T P +F P +F N The values of these metrics lie in the range of 0 to 1 except accuracy, which is represented as a percentage. Figure 23 and Figure 24 presents the year-wise distribution of published article count and citation count respectively. These graphs show that accuracy has the highest popularity followed by specificity and FPR.

B. ROC
The ROC (receiver operating characteristic) curve can also be used to measure the efficiency and efficacy of an IDS. A ROC is termed a performance curve. ROC is a graph between detection accuracy against false alarm rate. Alternatively, it displays the false alarm rate generated by the detector at a specified probability of detection [116]. The area under the curve (AUC) determines the misclassification in an IDS. If AUC is less than or equal to 0.5, it means misclassification is more than 50 percent, and the performance is poor for intrusion detection model [117].
For an illustration, we simulated an intrusion detection model using a decision tree as a classifier in an environment of Intel Core i5 2.60 GHz with 7.88 GB of RAM along with MatLab R2017b. The KDD cup'99 dataset is considered a benchmark dataset. The training set, which consists of 494021 records, is trained on the two response classes which are either of 'intrusion' or 'normal' dataset records. Hence, Figure 25 as ROC is plotted for the simulated IDS model. In Figure 25, the x-axis signifies the FPR (False Positive Rate) as 0.00, while the y-axis denotes the TPR (True Positive Rate) as 1.0. Here, classification accuracy is 100% for the training of the classifier. Figure 26 and 27 represents the year-wise distribution of publication and citation count, respectively. These two figures depict that articles related to ROC in the field of intrusion detection are growing gradually since 2005 and hence, the popularity and research trend in intrusion detection of ROC to evaluate the IDS model is minimal as compared to accuracy, specificity, and FPR.  The number of published articles, total citation count, highest cited article in conference and journal regarding performance measurements are tabulated in Table 10. Table 10 along Figure 23, and Figure 26 represents the publication analysis. While, Table 10 along Figure 24, and Figure 27 represents the citation analysis of the different performance evaluation matrix in the field of intrusion detection.

VIII. DISCUSSION
As highlighted in Section I, an IDS provides a prominent mechanism to the computer network security system by generating an alarm on detecting malicious information. The study in this article presents quantification of popularity based on citations of articles related to NIDS. We have analyzed the intrusion detection-related articles categorically in different classes viz-a-viz the commercially used IDS, the datasets to evaluate the NIDS models, the approaches used in NIDS, and different evaluation metrics. Here, Microsoft Academic is used for taking the records of citations of the published articles related to datasets, and various methodologies with their subclasses.
The analysis for research trends in benchmark datasets to evaluate NIDS models is also presented graphically. It is found that the KDD Cup'99 dataset has the highest popularity, followed by the NSL-KDD dataset. But the problem 2 Na et al. [113] with the KDD'99 dataset is that it is a very old dataset and it does not resemble the traffic data flow of the present scenario. Nevertheless, there are other datasets also available, but the research trend in these datasets is very less due to the less popularity of these new datasets among researchers. It is suggested that researchers must be encouraged to the new datasets with richer features according to the modern environment. Bioinspired-based NIDS, especially swarm-based NIDS, has very limited literature. These approaches often show quick convergence. But, there is a lack of theoretical literature that how these algorithms can do quickly converge. Parameter tunning is also another major issue related to the bioinspired-based approach and there are only a few articles related to parameter tunning for these algorithms for intrusion detection.
The tabular and graphical analysis of this article explores that researchers are more attracted to the field in which a high count of published articles with high citation values are recorded. Furthermore, the year-wise distributions also show that researchers abide by the research field in which publication count and citation have a high value. Unquestionably, the future of IDS is promising. Furthermore, the research trends in IDS research will grow where publication count and citation have high values for the articles. It is evident from the literature review that researchers are required to evaluate algorithms based on bio-inspired approaches for intrusion detection. Machine learning and bioinspired-based new hybrid algorithms can also be evaluated and compared to promote efficient and accurate intrusion detection systems.
A similar type of approach as used in the current article can also be implemented to quantify research trends in other areas such as image processing, cloud computing, data mining, bio-informatics, etc. This type of review will help for finding the most popular and disinclined methodologies in a particular research area. More effort will be made by the research community after finding such popularity comparison in those approaches where less effort have been made.
In the future, we want to implement the less cited bioinspired approaches that have few publication count values for articles related to network intrusion detection systems for future subsequent work. We want to determine whether the less cited approaches with fewer counts of published articles are equally applicable to achieve an efficient and effective intrusion detection model.

IX. CONCLUSION
We explored a comprehensive and straightforward analysis for anyone who wants to compare various approaches used to design Network Intrusion Detection models. This review is established based on numerous research papers in different journals/publications between 2005 and 2020. In this article, we took citation as a quantitative measure to review the popularity of the intrusion detection system among various approaches. This paper presents various tables that offer a rapid analysis of different NIDS, research trends, and research scope. A review of diverse datasets with their characteristics, merits, demerits, and citation analysis has also been presented here in this paper. The various approaches used in the network intrusion detection system are tabulated with their advantages and disadvantages also. A review concerning research trends regarding different techniques in IDS is presented.
The comparative research trend analysis regarding intrusion detection systems for a network is also graphically presented based on citation and number of published article counts. The most cited articles, with their citation count for conferences and journals, are also presented. The popular approaches, with the most cited papers regarding their methodology, are tabulated in different tables. We also observed that articles published in conferences have the highest citation than the articles published in journals.