Robust Incremental Outlier Detection Approach Based on a New Metric in Data Streams

Detecting outliers in real time from multivariate streaming data is a vital and challenging research topic in many areas. Recently introduced the incremental Local Outlier Factor (iLOF) approach and its variants have received considerable attention as they achieve high detection performance in data streams with varying distributions. However, these iLOF-based approaches still have some major limitations: i) Poor detection in high-dimensional data; ii) The difficulty of determining the proper nearest neighbor number k; iii) Instead of labeling the outlier, assigning a score to each sample that indicates the probability to be an outlier; iv) Inability to detect a long sequence (small cluster) of outliers. This article proposes a new robust outlier detection method (RiLOF) based on iLOF that can effectively overcome these limitations. In the RiLOF method, a novel metric called Median of Nearest Neighborhood Absolute Deviation (MoNNAD) has been developed that uses the median of the local absolute deviation of the samples LOF values. Unlike the previously reported LOF-based approaches, RiLOF is capable of achieving outlier detection in different data stream applications using the same hyperparameters. Extensive experiments performed on 15 different real-world data sets demonstrate that RiLOF remarkably outperforms 12 different state-of-the-art competitors.


I. INTRODUCTION
The rapid development of computer technology has led to the emergence of many scientific and commercial applications that generate high-speed large-volume data streams today. These applications must produce precise and accurate data to provide useful and reliable information to the user. However, dynamic environmental conditions and abnormal patterns (outliers) that do not conform to the expected behavior due to hardware malfunction, aging equipment, concept drift and sensor measurement errors may be observed during the data stream [1], [2]. Because outliers have become critical and actionable, their discovery is still one of the most important research topics in many real time applications.
Recently, local outlier detection methods have attracted great attention since they do not make any assumptions about the distribution of the data set. In these methods, a degree of being an outlier is assigned to each instance indicating how isolated the instance is with respect to the surrounding neighborhood. This degree is called Local Outlier Factor (LOF) of an instance [3]. The LOF is formed depending on the local density that is usually computed based on the Euclidean distance between the instance and its ′ k ′ nearest neighbor points. Generally, data with high LOF and low density are considered outliers. This strategy has been applied in many areas to detect outliers and has yielded very successful results [4], [5]. Therefore, various extensions and improvements of the LOF have been proposed, e.g., COF [6], LOCI [7], INFLO [8], LDOF [9], LoOP [10], ABOD, fastABOD [11], CARE [12], and DLC [13]. The outlier detection methods mentioned above operate in batch mode, so they are not suitable for use in real time applications generating large data streams. To overcome this limitation, Pokrajac et al. introduced the instance incremental LOF (iLOF) approach [14]. It only updates the LOFs of existing samples affected by the newly arrived sample. Since a small portion of the data set is affected by the arrival of a new sample, the processing time required to compute its LOF is significantly reduced. Hence, the iLOF has become a popular approach, and incremental versions of other methods have also been proposed, such as i-COF [15], i-LoOP [16], i-LOCI [17], i-ABOD, and i-fastABOD [18].
The existing LOF-based approaches have made significant improvements in detecting outliers from streaming data; however, there are still some major limitations. For example, anomalous points are detected depending on the relative density of data points. This means that as the dimensionality of the data increases, the computational efficiency will gradually decrease. Another disadvantage of LOF-based approaches is that even if the data distribution is known, it is difficult to accurately determine the nearest neighbor number parameter k. Depending on the k; more local or more global outliers can be identified. Generally, local outliers are detected in small k values, while global outliers are detected as k increases. Considering that data distribution in many real-time applications changes rapidly over time, it is obvious that the parameter k will be even more difficult to determine correctly [19]. Also, the value of the parameter k is closely related to the processing time. Increasing k values will result in higher processing time. The essence of LOF is to characterize the anomaly level of each data point. This indicates whether the data points are distributed over highdensity regions. However, when outliers occur sequentially, the density of the region with outliers inevitably increases, resulting in an outlier cluster. This situation becomes no longer detectable in the data stream.
In order to overcome these limitations of the LOF based approaches, a new metric named the Median of Nearest Neighborhood Absolute Deviation (MoNNAD) has been developed to determine outlier score. The MoNNAD score gives information about whether the incoming sample is an outlier according to the specified threshold (M T ) value or not, if it is detected to be an outlier, it is discarded from the data set. The combination of the new MoNNAD metric and the iLOF method, called Robust iLOF (RiLOF), is well suited for use in outlier detection on different streaming data with proper (fixed) k and M T parameters, without requiring hyperparameter adjustment from one application to another. Extensive experiments are conducted on 15 different realworld data sets to measure the effectiveness of the proposed system. From the experimental results, the proposed system can handle outliers that arise from different factors.
The main contributions of this paper are summarized as follows: 1) A new incremental outlier detection system, RiLOF, is proposed that will not affect computational efficiency even if the dimension of the data increases. 2) To minimize the negative impact of the k parameter, a new robust metric, MoNNAD, has been developed that uses the median of the local absolute deviation of the incoming sample from the nearest neighborhood instead of using all samples. 3) Most outlier detection methods in the literature either assign outlier score or perform labeling, but the proposed RiLOF method both labels outliers and assigns the degree of probability to be an outlier, thanks to the MoNNAD metric. 4) Since the proposed method remove outliers immediately after detecting them, it prevents the outliers from forming small outlier clusters and identifying them as inliers The remainder of this paper is organized as follows. Section II provides a summary of the literature on studies specifically on the use of incremental LOF-based methods for outlier detection. Section III gives information on LOF and iLOF, and then the proposed unsupervised and incremental robust outlier detection method is detailed. In section IV, experimental results on the real-world data sets are reported and performance results are compared with the benchmarking algorithms. In the last section, concluding remarks and the future aspects are drawn.

II. RELATED WORK
In recent years, many approaches have been developed to detect outliers and comprehensive surveys have been presented [20], [21]. Outlier detection methods can be roughly classified into two categories based on their labeling information: supervised and unsupervised methods. Unsupervised methods are more preferred as they do not require labeled data. When deciding whether the point is an outlier, it is called a global outlier if the entire database is used, and a local outlier if only part of the database (subset) is used [22], [23]. It is not possible to store large amounts of data stream entirely in memory, so it is challenging to detect outliers in real time using global approaches. Therefore, this study focuses on unsupervised local outlier detection strategies. Many unsupervised local outlier detection techniques are available in the literature, but among them, density-based techniques have been widely used as they outperform their competitors, such as statistical-based and distance-based approaches.
With the recent advances and studies in the field, new approaches have been proposed to detect outliers. Janssens et al. proposed a Stochastic Outlier Selection (SOS) method based on affinity relation on the data [24]. Samples with weak affinity to all other samples in the data set are more probable to be outliers. In this technique, a single point in the training set has a huge impact on outlier scores in its neighborhood. Therefore, a falsely labeled sample may have negative outcomes on the decision rule. Almardeny developed a Rotation-based Outlier Detection (ROD) method in which the feature space is divided into 3D subspaces and 3D vectors representing data points are rotated around the geometric median [25]. Outlier score of samples is computed with median-based statistical method and volumes of the rotations. Although the performance of the ROD method gives good results in low dimensional data, it is negatively affected in high dimensional case. Liu et al. proposed a Single-Objective Generative Adversarial Active Learning (SO-GAAL) method, which uses a single generator, and the Multiple-Objective Generative Adversarial Active Learning (MO-GAAL) method, an extended version of it, using multiple generators [26]. It can easily handle various set types and high rate of unrelated variables from experimental results in synthetic data sets. However, there is still more work to be done, such as the difficulty of discovering true outliers for the human analyst, the need for better insights and interpretations of outlier scores, and finally the need for active learning algorithms to process data streams that are still challenging.
The methods mentioned above are developed for static data sets, but are not suitable for data streams. For this purpose, the LOF approach based on sliding windows has been widely used for outlier detection recently. Salehi et al. developed fixed memory based incremental local outlier detection method (MiLOF) to decrease the memory requirement of iLOF [27]. MiLOF first clusters the data with kmeans and then use the cluster center point of each cluster, thus saving time and memory. However, since only the cluster center point data is used, this results in a decrease in overall anomaly detection accuracy. Na et al. proposed the density summarizing incremental LOF algorithm called DiLOF [28]. Since the density distribution between the data are not fully taken into account during the data extraction phase, the extracted data cannot represent historical data well, which leads to a decrease in accuracy. Most recently, Yang et al. proposed the Extract Local Outlier Factor (ELOF) [29]. The success of ELOF depends on many parameters. Therefore, very poor performance can occur with parameters that are not properly adjusted. Pevny proposed an unsupervised ensemble learning method to identify the anomalies defined as Lightweight Online Detector of Anomalies (LODA) [30]. It combines one-dimensional histograms created from arbitrary projections of data identified as weak classifiers to obtain a powerful detector. However, because the projections are randomly selected, they cannot be guaranteed to perform well in isolating anomalies. Wang et al. developed multiple instance triggered incremental outlier detection method [31]. Instead of an instance incremental process, an inserted-bag based algorithm is used. Although experimental results show good performance in both synthetic and real data sets, it requires long processing time. On the other hand, there are many similar methods [32]- [35], the accuracy of the algorithms is generally closely related to the selected data set.
As can be seen from the studies, LOF-based approaches provide good accuracy in detecting outliers without the need to know exactly the underlying distribution. With RiLOF introduced in the next section, specific solutions are proposed that can eliminate LOF limitations such as reducing the negative effect of the k parameter, not affecting performance even if the data dimension increases, labeling and deleting outliers, and preventing the occurrence of small outlier clusters.

III. METHODOLOGY
This section is divided into three parts. Firstly, the LOF approach is briefly described. Then incremental version of LOF is given and finally the proposed method is presented in detail.

A. LOCAL OUTLIER FACTOR (LOF)
LOF measures the outlierness degree of each instance based on the distribution density in the data set. As the LOF value increases, the probability of the instance being an outlier increases. After the LOF of all instances are computed, instances with LOF which are higher than the predefined threshold value are identified as outliers. The LOFs of the instances can be measured by following the steps below. Interested readers are directed to original articles for more detailed descriptions of these processes [3].
• Determine k-distance: x i is the i th sample in the data set X. k distance of a sample x i is the distance between x i and a sample x j , when it comply with the following conditions: is the distance between the i th and j th samples in the data set X (i ̸ = j). k is the number of neighborhood and defined by the user.
In the first step, k-distance (x i ) is computed. The kdistance neighborhood of sample x i is the k th closest distance between the i th sample and all samples in the data set, denoted by k-distance(x i ) where k denotes the k th nearest neighbor of sample x i . k distance of an outlier samples is higher compared to the inlier samples because k th nearest neighbors of the outlier samples are more distant unlike inlier samples.
• Determine reachability distance: In the second step, reachability distance of a sample x i with respect to sample x j is computed as • Determine local reachability density: In the third step, local reachability density of a sample x i is denoted as lrd k (x i ) and defined by • Determine LOF: In the last step, LOF value of each sample is computed. LOF is equivalent to average of the ratio of local reachability density of the k nearest neighborhoods and the local reachability density of the sample and demonstrated as  In the incremental LOF approach, an insertion algorithm is used to compute the LOF of each incoming data point as well as update the affected points LOFs. According to the computed LOF value, it is determined whether the incoming sample is outlier or not. The iLOF algorithm is started after k number of samples is loaded to the system where k is the user defined nearest number parameter. After the algorithm starts, the steps in the LOF approach are performed respectively with each incoming instance. The steps of the iLOF algorithm used in the study are as follows.
• The kNN of the incoming instance is defined using the Mahalanobis distance and the indices of the nearest k samples are determined. Subsequently, affected samples are identified through k nearest neighbors (kNN) and reverse k-nearest neighbors (RkNN). Then, reach-dist, lrd, LOF values of the incoming instance is computed and these values of the affected instances in the data set is updated. All these operations are done for the each incoming point. • Since the reach-dist, lrd, LOF values as well as the kNN and RkNN indexes do not change, the unaffected instances are not updated. Hence, process time of the iLOF algorithm is lower compared to the batch mode LOF (requiring running the LOF from scratch for each incoming instance) while achieving the same performance.

C. PROPOSED METHOD
In the literature, statistical techniques such as Standard Deviation (SD), Inter Quartile Range (IQR), the Generalized Extreme Studentized Deviation (GESD), Median Absolute Deviation (MAD), Z-Score and Robust Z-Score are employed to separate outliers from inliers in univariate outlier detection.
In SD technique, threshold values are set to the mean ± α * SD for the lower and upper threshold [36], where α value is commonly chosen 2 or 3. In IQR technique, lower an upper threshold are determined as Q1 − 1.5 * IQR and Q3 + 1.5 * IQR, respectively [37]. The sample outside in this range is considered as an outlier. In GESD technique, the number of possible outliers is determined by the user, and R i = max i |x i − mean (X)|/SD is computed for each sample [38]. Samples with scores higher than the significance level are considered outliers and deleted, and the same process is performed on the remaining samples, iteratively. In MAD technique, samples with a threshold value of 2.5 or higher is determined as outliers [39]. It is computed as M AD = α * median(|x i − median(X)|); where α is the constant parameter directly related to data distribution. It is set to 1.4628 for a Gaussian distribution and 1.0 for a Cauchy distribution. In Z-score technique, samples with a Z-score higher than 3 are possible outliers and it is defined as Z-score = (x i − mean (X))/SD. The robust version of the Z-score uses median and M AD instead of mean and SD [40]. It is defined as Robust Z-score = (x i − median (X))/M AD and samples with a threshold value higher than 2.5 are determined as outliers.
The statistical techniques mentioned above may fail due to the masking effect of the cluster of outliers. Because their LOF values are getting closer to the inliers. This can be solved in two different ways: decreasing the T value or increasing the k value. However, it is not easy to set the threshold value because there is a tradeoff between the T and the performance. Lowering the T raises the success of predicting outliers, but this yields to an increase in the false positive rate. On the other hand, increasing the k value can reduce the impact of outlier clusters, resulting in a decrease in the accuracy of detecting local outliers. Moreover, it also leads to a higher computational time. In this work, to overcome the disadvantage of these outlier detection tecniques, a new MoNNAD metric has been developed. MoNNAD score of the incoming sample is computed by the median of absolute differences between the LOF value of the incoming sample and the k nearest neighborhood LOF values of this sample. It is defined by where, LOF xi is the incoming sample's LOF value and the LOF xj is the LOF value of the kNN samples. Outlier labeling and scoring process of the query sample (for k = 3) in the proposed RiLOF method is shown in Fig 1. The RiLOF method determines whether each incoming sample is an inlier or an outlier using the MoNNAD score. Samples with a MoNNAD score higher than or equal to the specified threshold value are determined as outliers. While equal emphasis is given to the query sample and the nearest neighbor samples in statistics-based techniques, more emphasis is given to the query sample in the RiLOF method. It makes the distinction between inliers and outliers clearer, causing samples with higher probability of outliers to obtain higher scores, which is the most important advantage of the RiLOF method.
The statistical techniques, SD, IQR, GESD, MAD, Z-Score, and Robust Z-Score are suitable for univariate outlier detection problems. Therefore, multivariate data instances need to be transformed into univariate by moving to another domain. This transformation is performed by computing the LOF of multivariate data samples. Fig. 2 shows a synthetic data set containing global and local outliers, and visual graphs of outliers detected using the above-mentioned statistical techniques. In the scatter plot in Fig. 2a, the cluster of normal samples (C 1 ), the small outlier cluster (C 2 ), and p 1 , As it is known, in LOF-based algorithms, outlier score is computed instead of labeling. Outliers can be recognized with a threshold value determined by considering the scores of the samples. If the threshold value is chosen as 2, global outliers shown in red can be identified due to high LOF, while outliers in small cluster C 2 cannot be identified. Because, samples in C 2 form an outlier cluster and their LOF values become similar to the inliers. In addition, a point inside the cluster C 1 is incorrectly detected as an outlier. Results for the GESD technique are shown in Fig. 2c. Global outliers are successfully detected, but 5 points inside the cluster C 1 are falsely detected as outliers. Besides, outlier cluster C 2 is not detected at all. The threshold plane is displayed for visual interpretation, including the sample with the smallest LOF value among detected outliers. According to the results obtained by the IQR technique in Fig. 2d, global outliers are determined precisely, while 6 different inliers in cluster C 1 VOLUME 4, 2016 are falsely determined as outliers. Also, the outlier cluster C 2 is determined as the inliers. Fig. 2e shows the results of the SD technique. It is clear from Fig. 2e that although the global outliers are recognized, the outlier cluster C 2 is not identified. The same results can be seen from the Z-score graph shown in Fig. 2f. The results of the MAD technique are given in Fig.2g. From Fig. 2g, samples p 1 and p 2 are considered outliers, while the sample p 3 and outlier cluster C 2 are not recognized. Robust Z-score results are shown in Fig. 2h. While accurately detecting global outliers, it incorrectly detects a sample in the cluster C 1 as an outlier. Also, the Robust Z-score graph could not accurately detect the C 2 outlier cluster as in Fig. 2f. Finally, Outlier detection results of the proposed MoNNAD metric are shown in Fig.  2i. While none of the other techniques could recognize the outlier cluster C 2 , it is obvious that MoNNAD correctly detected all outliers in the data set, including outlier cluster C 2 . That is because the proposed RiLOF method uses the median of the local absolute deviation from the incoming sample instead of using all samples to detect and delete outliers, thus avoiding the formation of a small outlier cluster.
Determination of the parameters in the proposed RiLOF method is critical as in other unsupervised methods. The optimum parameter determination strategy for the kNN and threshold is presented in Section IV. In the RiLOF method, samples with a MoNNAD score of 0.5 or higher are considered to be outliers. Detected outlier samples are deleted during incremental learning. This leads to less memory usage. The intuition behind this approach is to avoid the formation of outlier clusters and prevent the outlier points determined as inliers. Outlier clusters cause outlier points to have a lower LOF score and degrade algorithm performance in both batch and incremental mode. When the samples with high probability of being outlier are deleted in the process of incremental learning, the LOF scores of the new samples that show similar characteristics to the outlier samples remain high because they locate in less dense regions and yields a performance increase. The implementation of the proposed RiLOF method is demonstrated in Algorithm 1.

IV. RESULTS AND DISCUSSION
This section presents the effectiveness of the proposed RiLOF method and compares it to the other unsupervised outlier detection methods. All experiments are conducted on a computer with an Intel Core i7-6900K CPU@3.2 GHz processor, 64 GB of RAM and Windows 10 operating system. The data sets and evaluation criteria used in the study are explained in detail below. In addition, the effects of the parameters used in the proposed RiLOF method on the performance are discussed in detail, and the observed findings are reported.

A. DATA SETS
In experimental evaluation, only real-world data sets are taken into account, as synthetic data sets cannot fully reveal the behavior of outliers. To this end, 15 different real-world for ∀ x i ∈ kN N (xq) do 5 Compute reach − dist(xq, x i ) using Equation 1 6 end for 7 for Update lrd(x i ) using Equation 2  16 Update LOF (RkN N (x i )) using Equation 3  17 end for 18 Compute lrd(xq) using Equation 2  19 Compute LOF (xq) using Equation 3 20 Compute  [42]. The information on the data sets used such as number of instances, features, inlier and outlier inclusion conditions, are detailed in Table 1.

B. EXPERIMENT SETUP
Proposed RiLOF method implemented using Phython version 3.7.7. In the implemantation of LODA and ROD, Python Outlier Detection (PyOD) toolbox is utilized [43]. For the SOS 1 , So-GAAL 2 and MO-GAAL 2 methods, publicly available source code are employed. iLOF [14], i-LOCI [17] and i-fastABOD [18] methods are implemented based on the original articles. Since there is no reliable Python code for the INFLO and LDOF methods, our own implementation based on the author's article is used [8], [9], respectively. In the proposed RiLOF method, singular matrix error is observed in some data sets due to the use of Mahalanobis distance metric in the nearest neighbor search. This problem has been solved by increasing the starting index in incremental learning.

C. PERFORMANCE MEASURES
Data sets used for outlier detection contains inlier and outlier samples, it can be considered as a binary classification problem [26]. Metrics such as accuracy, recall and F-score used as success criteria in binary classification methods are sensitive to data distribution [44]. Therefore, using them as performance criterion in outlier detection methods may cause erroneous evaluations. Because, the classes are unbalanced as the ratio of the number of outliers is very low compared to inliers. The Receiver Operating Characteristic (ROC) curve, which is not affected by data distribution, is frequently used as a performance evaluation criterion in the literature. ROC curve depicts the trade-off between sensitivity and specificity. Moreover, it allows visual comparison of the different methods on the same graph. The Area Under the ROC Curve (ROC-AUC) summarizes the ROC curve to a scalar value and makes it easy to compare different methods. A higher ROC-AUC score indicates a better performance.

D. PARAMETER SELECTION
Determining the optimum parameters in any machine learning algorithm is crucial to the algorithm's performance. In particular, optimum parameter selection is even more difficult in unsupervised outlier detection algorithms. Because the number of features and also outliers are not known in advance and they differ from one application to another. Therefore, two parameters need to be adjusted in order for the proposed RiLOF method to perform better in different applications.  Table 2. As can be seen from Table 2 Table 2 are taken into account, it will be seen from that ROC-AUC is the highest 0.94, the smallest 0.58 and the average 0.84. Table 3 shows the ROC-AUC scores for different k and M T values in the WBC data set. From Table 3 Table 3, it will be seen that the highest, smallest and average values of the ROC-AUC scores are 0.98, 0.87, and 0.95, respectively. Table 4 Table 4, it will be noted that the highest, smallest and average values are 0.88, 0.56, and 0.79, respectively.
Finally, ROC-AUC scores for different k and M T values in Glass data set are given in Table 5 Table 5, it will be realized that the highest, smallest and average values of the ROC-AUC scores are 0.97, 0.55, and 0.83, respectively.
As can be seen from detailed k and M T analyzes, there is no linear relationship between k and M T values even on the same data set. High ROC-AUC scores can be obtained at low M T and high or low k values in one data set and VOLUME 4, 2016     [3]. Also, according to the experimental results of previous scientific studies, the effect of k cannot be fully predicted even in data with Gaussian distribution, while it will be more difficult in real-world data whose distribution is unknown. It is inevitable that this situation will also affect the scores of the developed MoNNAD metric. One of the most important aims of the study is to minimize the negative impact of userdefined parameters (k and M T ) and to quickly get the best (or closest) results in different data stream applications. According to the results in the Hepatitis data set in Table  4, the ROC-AUC score (0.79) is above the average for k > 9 only at 0.50 ≤ M T ≤ 0.65. In the Glass data set in Table 5, the ROC-AUC score (0.83) is above average with 0.20 ≤ M T < 0.65 for all k values. In the Breast_O data set in Table 2, the ROC-AUC score (0.84) is above the average with 0.50 ≤ M T ≤ 0.80 for k ≥ 9. On the other hand, in the WBC data set in Table 3, the effect of increase or decrease in M T at different k values on ROC-AUC scores is negligible. Based on these observations, it has been concluded that M T = 0.5 is an acceptable value for all data sets in terms of robustness and accuracy in the developed MoNNAD metric.

E. DETAILED ANALYSES OF K
In this section, the behavior of the proposed RiLOF method for M T = 0.50 against various k values is analyzed. For this purpose, in Fig. 3, ROC-AUC curves of RiLOF in realworld data sets are plotted with different k values for constant M T = 0.50. Based on the results, increasing the k value in 8 of these data sets (Breast_P, Connectionist_B, Glass, WBC, Ecoli, Musk, Page Blocks, and Shuttle) either does not affect the performance or slight changes are observed. This is due to the use of the Mahalanobis distance metric, which computes data distribution based distance instead of Euclidean distance, and the proposed MoNNAD is robust against small fluctuations in LOF values. In 7 of these data sets (Hepatitis, Biomed, Heart Disease, Boston_HP, Breast_O, SpamBase, and Satimage), the ROC-AUC score rises with the increase of k. On the other hand, according to the experimental results in [3], LOF values stabilize after k = 10. Therefore, the value of k should be chosen to be at least 10 in order to be less affected by fluctuations in the LOF value caused by k. In addition, the process time should be taken into account, since choosing the value of k too large will require high process time (cost) [45], [46]. In view of these facts, it would be a good option to choose the smallest k value greater than 10 according to M T = 0.50 to obtain the highest (or near) ROC-AUC score. k = 11 is a suitable choice considering the trade-off between process time and accuracy. For M T = 0.50 and k = 11, the highest ROC-AUC score is obtained in the Breast_O data set (Table 2), while the results closest to the highest ROC-AUC score (differences are negligible) are obtained in WBC (Table  3), Hepatitis (Table 4), and Glass data sets (Table 5).

F. PERFORMANCE COMPARISON
In this section, RiLOF method is compared with the 12 different unsupervised outlier detection methods that can be divided into two main groups according to their operating modes: incremental mode algorithms (RiLOF, iLOF, i-fastABOD, i-COF, i-LOCI, and LODA) and batch mode algorithms (INFLO, LDOF, LoOP, SOS, ROD, SO-GAAL, and MO-GAAL).
For fair comparison of the proposed RiLOF with the benchmarking outlier detection methods, the same parameters are used as much as possible. iLOF, i-fastABOD, i-COF, INFLO, LDOF, and LoOP take k parameter, and it is set to 11. However, i-LOCI uses α and k σ to detect outliers instead of k parameter. In the reference study [7], these parameters are defined as α : 0.5 and k σ : 3, so the same values are used in this study. In the implementation of the SOS method, the perplexity parameter h is set to 4.5 as in the reference paper [24]. Similarly, in the So-GAAL and MO-GAAL methods, the parameter values recommended in the reference article are used [26] . ROD is a parameterless method, so it does not need any user defined parameters [25].
The ROC-AUC scores of the proposed RiLOF and the comparative methods with the given hyperparameter described above are demonstrated in Table 6. Bold highlighted ROC-AUC scores indicate the highest performance for the particular data set. Due to the process time of the i-LOCI, data sets larger than 4500 samples in size are not implemented. Moreover, ROD could not be implemented on the Musk data set owing to the out of memory error. Performances of the LODA, SO-GAAL, and MO-GAAL methods are varied in each implementation, so they are iterated in 10 consecutive times for each data set to obtain more reliable and scalable results and then the average ROC-AUC scores are given in Table 6.
The experimental results in Table 6 demonstrate that the proposed RiLOF method achieved the best performance in twelve out of the fifteen data sets. In the remaining three data sets, it is ranked the third, yet very close to the top two. Specifically, RiLOF improves performance by approximately 10% in Hepatitis, Breast_P, Heart Disease, Ecoli, and Spam-Base data sets, and at least 40% more in Connectionist_B data set compared to other methods. It also shows the optimum outlier detection performance in the Musk data set.
From the results shown in Table 6, it is seen that the performances of benchmarking techniques vary depending on specific data sets. For example, the i-LOCI's maximum ROC-AUC score (0.99) is closest to the best in the Musk data set, but performs rather poorly in the other data sets such as Biomed (0.  Table  6 that RiLOF produces more robust results using the iLOF algorithm, which has the ability to detect outliers in data streams without considering the data distribution, and the developed MoNNAD metric. Finally, as can be seen from the last column of Table 6, the RiLOF method performs well above average in each of the 15 different data sets. Further, from the last row of Table 6, the average ROC-AUC score (0.8431) achieved by RiLOF in 15 different data sets is also quite remarkable. For example, RiLOF outperforms the closest method (ROD) by 17% in terms of average performance. Considering that especially the ROD, LODA, and i-fastABOD methods that show the closest performance, it is obvious that the proposed RiLOF method can be easily applied to data stream applications in different fields.
For a more comprehensive comparison, ROC-AUC charts showing the performance of the proposed RiLOF and other outlier detection methods based on the k parameter should be considered. Fig. 4 shows the ROC-AUC scores corresponding to k values ranging from 5 to 35 across 15 different real-world data sets of the benchmarking algorithms, with the exception of the i-LOCI algorithm. In i-LOCI method, ROC-AUC scores of the different k σ values analyzed. Due to limit of space and to make the comparison on the same graph, the k σ values (for 1,2, and 3) of the i-LOCI method are indicated with a star marker corresponding to the k value 5, 7, and 9, respectively. The performances of SOS, LODA, ROD, SO-GAAL, and MO-GAAL methods are not demonstrated in Fig. 4 since they do not take k as an input parameter. From Fig. 4, it can be seen that the proposed RiLOF method outperformed the comparative methods in 12 out of 15 data sets for the almost all k values by the ROC-AUC performance metric. Moreover, in Breast_P, Con-nectionist_B, Glass, WBC, Boston_HP, and Page Blocks data sets, the increase of k hardly changes the ROC-AUC score achieved with RiLOF. Furthermore, RiLOF shows the optimum outlier detection performance for all k values in Musk and Shuttle data sets. In the remaining data sets, the increase of k slightly increases the ROC-AUC score obtained by the RiLOF method. As can be deduced from these graphs, the proposed RiLOF method shows more robust behavior compared to other nearest neighborhood based methods, even with changes in k.
Extensive experiments on real-world data sets show that the proposed RiLOF method performs much better than both batch mode and incremental mode algorithms in different data sets. Especially, considering that fixing k = 11 and M T = 0.5 gives very effective results in RiLOF method, it is obvious that RiLOF is very suitable and efficient for data streams in different fields.

V. CONCLUSION
In this article, a robust outlier detection method (RiLOF) is presented to detect outliers from data streams in real time. RiLOF has many improved advantages over iLOF based algorithms: (1) High detection rate even in high dimensional data; (2) Hardly affected by the number of nearest neighbors, k; (3) The ability to both label the outlier and assign a score to each sample that indicates its probability of being outlier; (4) The ability to detect a long sequence (small cluster) of outliers. The success of RiLOF is due to the newly developed MoNNAD metric. The MoNNAD score gives information about whether the incoming sample is an outlier according to the specified k and M T values or not, if it is detected to be an outlier, it is deleted from the data set. The intuition behind the deletion of possible outliers is that increasing the effect of inlier points on the computation of LOF and MoNNAD scores while reducing the potential adverse side effects of outliers and preventing the formation of outlier clusters. A series of experiments are performed on 15 different realworld data sets to analyze k and M T 's effect. According to these results, it has been determined as k = 11 and M T = 0.5. The proposed RiLOF method performed better than both incremental mode algorithms (iLOF, i-fastABOD, i-COF, i-LOCI, and LODA) and batch mode algorithms (INFLO, LDOF, LoOP, SOS, ROD, SO-GAAL, and MO-GAAL).
Consequently, the RiLOF method is very suitable and effective for use in varied dimension of fast and large data streams in different fields. In the future, the proposed RiLOF method's performance can be improved by the following aspects. Although the proposed RiLOF method requires less memory than the iLOF method, it is open to memory improvements for large data streams. Nearest neighbors and the LOF score's computation can be implemented by using Graphical Processing Units to decrease the processing time.