Concept Drift Detection Based on Typicality and Eccentricity

Many applications and fields produce a vast quantity of time-relevant or continuously changing data which may represent new phenomena. This data stream behavior is known as Concept Drift. The need to efficiently and accurately process online data streams is a current need in many areas. Concept drift is a cause of performance degradation of classical machine learning approaches. It is necessary to address the concept drift to deploy real-world applications fed by data streams. This work presents a perspective of Concept Drift Detector (CDD) application to empower a data stream classifier in a real-world scenario followed by the proposal of Concept Drift Detector based on Typicality and Eccentricity Data Analytics (TEDA-CDD). Our method employs two models in monitoring the data stream in order to keep the information of a previous concept whereas monitoring the emergence of a new concept. The models are considered to represent two distinct concepts when the intersection of data samples are significantly low, described by the Jaccard Index. TEDA-CDD is compared to known methods from literature in experiments using synthetic and real-world datasets simulating real-world applications. In these experiments, TEDA-CDD performs comparably in terms of accuracy against well-established algorithms whereas presenting higher memory efficiency.


I. INTRODUCTION
Data stream processing is a growing area where applications and fields produce a vast quantity of data.These data are time-relevant or continuously changing, representing new phenomena.Therefore, it is essential to use evolving techniques to process the data stream and these techniques must be wary of concept drifts.In this scenario, determining when the data stream has changed enough to demand readjustments for a real-time application is essential.
A common approach is to define a Concept Drift Detector (CDD) to monitor the data stream and determine when concept drift occurs to prevent inappropriate data processing.There are three major categories of CDD: supervised, unsupervised and semi-supervised [1].Supervised CDD techniques assume that the ground-truth label of a data sample The associate editor coordinating the review of this manuscript and approving it for publication was Berdakh Abibullaev .
is known immediately after a prediction.In unsupervised CDD, the prediction feedback is delayed or does not exist.A semi-supervised CDD has access to a small number of ground-truth labels enabling the combination of supervised and unsupervised approaches.
A CDD can also be categorized multidimensional or unidimensional.The multidimensional approaches can capture more detailed information on each concept by leveraging the model's complexity.Unidimensional methods captures less detailed information on each data concept whereas it is far simpler and less resource-demanding.The trade-off between these two approaches is relevant to the application performance in terms of computation time, memory usage and model complexity.
In recent years, several approaches have been proposed for unsupervised concept drift detection.One such method, NN-DVI, utilizes the nearest neighborhood concept to compare densities between the reference and detection windows [2].Another approach, FAAD, is designed to detect sequence-anomalies in multidimensional sequence data streams that are prone to concept drift [3].It employs information theory concepts to select features and reduce redundancy, reducing the workload associated with high dimensionality.Anomaly detection uses models built with random feature sampling to generate scores, which are then compared to a user-defined threshold.A comparison between the proportion of anomalies in the reference and detection windows indicates the occurrence of concept drift.
Furthermore, UDetect detects concept drift in activity data streams using a supervised classifier model as a reference to compare unlabeled data instances [4].Similarly, the algorithm SQSI-IS [5] is based on SQSI [6].SQSI relies on a supervised offline trained model to provide scores to unlabeled data (detection window) and training data (reference window).Divergence in score densities indicates a concept drift.SQSI-IS introduces an additional step of instance selection for the unlabeled data using the Kolmogorov-Smirnov Test.
MD3 is an approach that monitors the decision boundaries of an SVM classifier to detect concept drift [7].It looks for density changes within the boundary region, which indicates concept drift, triggering a classifier retrain.On the other hand, the algorithm DDAL utilizes active learning to determine relevant instances in the incoming data stream [8].DDAL uses these instances to calculate the maximum and minimum densities and constructs a range with the same values from the reference window.This range is then compared to a user-defined threshold to determine if a concept drift has occurred.
Despite the discussed methods, the literature has a higher concentration of supervised concept drift detection papers [9] and unsupervised concept drift detection research area is considered unexplored [10].In this context, this work focuses on efficient unsupervised unidimensional concept drift detection approaches for its performance and realistic approach.It is essential to have a performative method to process data stream due to its sample-wise nature.A sample must be processed before the next arrives.Also, the unsupervised approach addresses the realistic fact of concept drift detection: the true label of a data sample is unknown and maybe never becomes available.
Concept Drift Detector based on Typicality and Eccentricity Data Analytics (TEDA-CDD) is a concept drift detector based on TEDA, a framework for data analytic leveraging on typicality and eccentricity [11].It is devised for unsupervised scenarios where the ground-truth label of the incoming samples from the data stream are unavailable.Furthermore, TEDA-CDD is an unidimensional CDD, which means that each feature of the data stream must be individually monitored by distinct TEDA-CDD instance.It monitors a data stream feature by using two TEDA-based models and comparing them at each sample arrival.One model is more resistant to change whereas the other models the most recent data samples.The models are compared using the Jaccard Index based on characteristics intrinsic to TEDA.The principal novelties are the use of TEDA for modeling the concepts and Jaccard Index for comparing the models.The proposed CDD presents a competitive performance compared to other unsupervised CDDs and more memory efficiency using up to three times less memory.
In this paper, we present a novel unidimensional unsupervised Concept Drift Detector, which utilizes TEDA as a concept modeling tool, offering an efficient balance between time and memory consumption.To enhance its capability to detect new concepts, we introduce a forgetting factor associated with the TEDA model.The detection of emerging concepts is achieved using the Jaccard index.Our novel model is thoroughly evaluated through experiments simulating real-world data stream processing scenarios using a data stream classifier.Additionally, we delve into an in-depth discussion of the essential characteristics and behavior of the data stream classifier under consideration.In this paper, we present a novel unidimensional unsupervised Concept Drift Detector, which utilizes TEDA as a concept modeling tool, offering an efficient balance between time and memory consumption.To enhance its capability to detect new concepts, we introduce a forgetting factor associated with the TEDA model.The detection of emerging concepts is achieved using the Jaccard index.Our novel model is thoroughly evaluated through experiments simulating real-world data stream processing scenarios using a data stream classifier.Additionally, we delve into an in-depth discussion of the essential characteristics and behavior of the data stream classifier under consideration.
The following section, Section II, presents a realistic description of concept drift detection on data stream processing.The description of the TEDA-CDD is in Section III.The discussion on experiments and results using the proposed CDD is in Section IV.Finally, the paper concludes in Section V with arguments defending the proposed approaches, limitations regarding the TEDA-CDD applications, and future works exploring the concepts in this paper.
These potential applications have restrictions regarding the use of data available regarding size and aging.Some applications present challenges to storing and processing data due to time constraints and data growth rate.In some applications, even if storing data is mandatory, older data becomes irrelevant as newer data changes and represents new concepts.Generally, data stream processing techniques must quickly and efficiently process data samples and adapt to any concept drifts.Another common characteristic of data stream processing techniques is to process each data sample once.This last characteristic is known as single pass and is a consequence of data availability restrictions on data streams [32].
Concept drift is a relevant aspect of data stream processing.Since a concept drift represents a change in the data stream, it can no longer be considered stationary [33].Therefore, algorithms and techniques that assume stationarity in data are ineffective as the data stream evolves, demanding new algorithms to process data streams effectively.
There are two known taxonomies regarding concept drift.One classifies the concept drift depending on how the concept changes over time.The second classifies the concept drift based on the relation between features and target.Regarding change over time, concept drifts can occur in four ways: abrupt, gradual, incremental, and reoccurring.Figure 1 illustrates concept drift types.Abrupt concept drift is an abrupt change in the occurrence of concepts in consecutive samples.Gradual concept drift is composed of abrupt concept drifts that gradually increase the new concept occurrence probability while decreasing the old concept occurrence probability.In contrast, an incremental concept drift is a smooth transition between two persons where a previous person transforms into a newer person going through many intermediate states.Finally, in the reoccurring concept drift, concepts keep reappearing in the evolving data stream over time.
The learning process of a machine learning model is analogous to estimating a conditional probability density function between a target variable, y, and a feature vector, X, as in Equation (1) [34].

P(y|X) = P(y)P(X|y)
y P(y)P(X|y) In this context, Equation (2) represents a concept drift in a data stream between two time instants.
P t 0 (X, y) ̸ = P t 1 (X, y) ( where P t 0 (X, y) denotes the joint distribution at time t 0 between features, X, and target, y, variables.A concept drift can be classified as real or virtual based on the relation between features and target.In a real concept drift the way to represent a concept changes over time even if the features remain stationary.Equation (3) represents the condition for a real concept drift.
whereas a virtual concept drift occurs when the incoming data distribution change without affecting the P(y|X).The condition for a virtual requires P(y|X) remains stationary whereas P(X) changes as in Equation ( 4).
The distinction of real and virtual concept drift is meaningful in supervised learning whereas in unsupervised learning one can assume that any concept drift is virtual.Due to data stream processing restrictions, a classical Machine Learning technique faces obstacles in training and maintaining good performance.For instance, data can not be considered stationary, and concept drift degrades the model performance making them obsolete over time.Specifically, in supervised techniques, data streams do not provide the correct label at the prediction time, delaying the model adaptation for when and if the data stream provides a class.However, they are usable if it is possible to mimic the stationary condition on data stream segments or if incremental learning approaches are employed.
Therefore, any machine learning model must be adaptable as the concepts evolve when processing data streams.A sample, batch, or concept drift can update the model.When updating sample-wise or batch-wise, the model update occurs as soon as data is available.It is blind to performance degradation.Using concept drifts to determine when to update the model is more efficient.Concept drift is a cause of performance degradation.To enable update by concept drift, a CDD is needed.
Using a CDD enables the deployment of Machine Learning models in real applications.Considering the classification task, a Data Stream Classifier may use a CDD to prevent performance degradation.This strategy allows the model to efficiently and quickly update internal parameters to adapt to new concepts.The following subsections deepen the discussion on Data Stream Classifier and CDD concepts while highlighting the arguments to develop the TEDA-CDD.

A. DATA STREAM CLASSIFIER
A Data Stream Classifier must have at least three components to perform the task in a real-world application.• Classifier model The first component is a classifier model to provide a label based on the known features.As discussed earlier, a CDD is necessary to efficiently and effectively update the classifier model.Finally, a data sample storage for retraining.Figure 2 illustrates a real-world Data Stream Classifier.
A realistic assumption on data stream application is that a data stream exists before the application.It implies that there are available data from the data stream (pre-deploy data), which enables offline training.Once online, the model can be used for prediction and retrained when needed.Therefore it is possible to avoid cold starts of Data Stream Classifiers and leverage this strategy in updating the model.Figure 3 illustrates this idea.
To enable the classifier update, a CDD monitors the available data.It is important to note that not the whole data sample may be available at the same time.When processing a data stream in a supervised task, the ground-truth label is unavailable at prediction time.There is no guarantee that the data will receive the actual labels within the time to update the model.Therefore, a supervised concept drift detector is not appropriate for many cases.Waiting for the ground-truth label to perform a concept drift detection enables the model to make predictions on a data stream that suffered a concept drift without raising the alarm.Therefore an unsupervised online strategy must be applied as the first line of defense against model aging.

B. CONCEPT DRIFT DETECTOR
A CDD determines when a data stream has suffered a significant change.These changes are relevant when they invalidate any assumption on the data or invalidate known descriptors.In these cases, it is necessary to take action and correct the assumptions or update the descriptors.
A CDD must monitor the incoming data and process it adequately to determine if a change has occurred.CDDs are classified as supervised, semi-supervised, and unsupervised, depending on the available data and processing strategy.The supervised approach assumes that the ground truth of the task it is assisting is known.A semi-supervised CDD has access to a limited ground truth labels.An unsupervised approach only considers the feature variables.
In a supervised approach, when the ground truth of the task is known, the CDD may monitor descriptors for the target and feature variables.Although, a common strategy is to monitor the model performance in terms of hits.
When the target variables are unconsidered, the CDD is considered unsupervised.It is only possible to monitor the descriptors in these cases.This approach has two main benefits: reduced time to detect a concept drift, and the underlying descriptors are frequently updated to represent the concept instead of increasing an arbitrary performance metric.The major downside is that some concept drifts are invisible for unsupervised approaches, for example, switching two known classes.
Another relevant aspect of a CDD is if it is unidimensional or multidimensional.A unidimensional CDD monitors each feature individually, needing multiple CDDs to monitor a multidimensional feature space.Whereas multidimensional CDDs can monitor the entire multidimensional feature space with only one CDD instance.The main point of a trade-off between unidimensional and bold CDDs is the complexity.Whereas the unidimensional CDD is less complex, the multidimensional CDD may be more accurate.The complexity affects performance in terms of memory, time, and concepts.
Usually, supervised CDDs are unidimensional and only process the hits of a classifier.Unsupervised CDDs are more common in unidimensional approaches by their simplicity and efficiency.
Using a classifier and a CDD to monitor a data stream, a Data Stream Classifier may run online indefinitely with low-performance degradation.The present work focuses on CDD and assumes that any data stream classifier can take advantage of an unsupervised concept drift detector.This work focuses on unidimensional unsupervised CDD to detect concept drifts in incoming data streams.

III. TEDA-BASED CONCEPT DRIFT DETECTOR
Concept Drift Detection is essential for evolving models and traditional deployed models.In realistic scenarios, the detection of virtual concept drifts can occur before real concept drifts since a virtual concept drift does not directly depends on the target feature.Therefore, an unsupervised concept drift detector can detect early changes in the distribution of the input features.It is a principal motivator to propose an unsupervised concept drift detector.Another factor that enables early detection is sample-wise processing, and using 13798 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.TEDA as the base model enables sample-wise processing.Therefore, TEDA-CDD is an unsupervised concept drift detector based on TEDA.
TEDA is a framework for data analytics and defines a way to measure how typical or eccentric (atypical) a data sample is to a data set [11].Typicality measures how representative a sample is to a data set.Inversely, eccentricity measures how abnormal a data sample is to a data set.Both metrics are defined based on a distance or similarity measure to all other data in the dataset.The framework is theoretically independent of any specific distance measure.Although, the implementation can be optimized and problem-specific based on the distance measure utilized.
In this work, the Euclidean distance is used and implicated in a series of definition formulations and specific limitations.For instance, the implementation of TEDA using Euclidean distance defines a hyper-spherical pertinence threshold.A hyper-sphere does not capture covariances in a multidimensional space.It provides an improper pertinence threshold in most cases.The spherical implication indirectly limits TEDA-CDD as a unidimensional concept drift detector.Therefore, in a multidimensional machine learning task, each input feature has an instance of TEDA-CDD.
TEDA-CDD has four essential components: reference data model, evolving data model, detection metric, and reset strategy.Both the reference and evolving models are TEDA based and referred as concept models.The detection metric is based on the Jaccard Index to compare the model ranges.Finally, the reset strategy considers the relevancy of past models and avoids cold restarts.

A. REFERENCE MODEL
The reference model uses the classical definition of TEDA to represent the concept known by the classifier.Whereas the evolving model uses an adaptation to describe the current features state.Therefore, the reference model uses typical data samples to update the internal parameters.The reference model disregards any atypical data sample.This approach considers the reference model incomplete and can change within a tolerable range.
In general, to determine if a data sample is atypical, the eccentricity of the data sample is calculated and compared to the threshold.When the eccentricity is lower than the threshold, the data sample is considered typical, and the reference model is updated.The reference model is not updated if the data sample is atypical to retain the original concept representation.
The eccentricity can be estimated as in (5) when considering the Euclidean distance.Which is based on a recursive estimation of the mean, µ k , and the variance, σ 2 k , where k is the time index of the data sample, and n is the number of data samples processed.
The values of mean and variance can be recursively estimated by ( 6) and (7).
The threshold devised to determine if a data sample is eccentric derives from the Chebyshev inequality.The eccentric threshold is a function of a sensitivity parameter and the total number of processed typical samples.Equation (8) describes this threshold where n is the number of samples and m is a parameter to control the threshold sensibility [11].The sensitivity parameter m has a similar effect of a multiplicative factor of sigma (mσ ).Higher values of m make difficult to encounter atypical data samples whereas lower values causes more data samples to be considered atypical.The parameter m is a positive real number and it is indicated values between 1 and 3.
Finally, the reference model is composed of three internal parameters: n, µ, and σ 2 .The parameter n is the number of samples considered typical for the model.Parameters µ and σ 2 are updated using Equations ( 6) and (7), respectively.

B. EVOLVING MODEL
A key difference between the reference and evolving model is that the evolving model is updated by each new data stream sample whereas the reference only by the typical samples.Although, the reference model weighs each sample equally making new concepts hard to detect.For this reason the evolving model uses an adaptation of TEDA.
The adaptation used for constructing the evolving model of TEDA is regarding the ability to focus on recent data samples.
That means older data samples have less contribution to estimating eccentricity.This effect is achieved by introducing a forgetting factor in the update formulas for mean and variance, effectively creating an exponential window.
The forgetting factor enables TEDA to forget past concepts by recursive applying an exponentially decreasing weight as the number of samples grows.The ability to forget an outdated concept enables TEDA to model the current state of a data stream.Equations ( 9) and ( 10) describe the update strategy using the forgetting factor, α.The parameter α affects the evolving model as a sentivity parameter.The higher the value of α more important is a data sample to the current concept and the evolving model is more sensible to noise.Conversely, the lower the parameter α more less important is the incoming data sample to the current concept whereas the model is more robust against noise.Acceptable values for α are between 0 and 1. Ideally, closer to 0. It is important to note that using Equations ( 6) and ( 7) avoids weighting inconsistency while the value of 1−α is higher than (n−1)/n.
The evolving model is compatible with Equations ( 5) and ( 8).This fact makes the concept models comparable in terms of internal parameters, over time.

C. DETECTION STRATEGY
TEDA-CDD indicates a concept drift when the reference and evolving models are sufficiently distinct.In this context, a detection strategy based on the Jaccard Index is proposed.If the strategy indicates a dissimilarity then a concept drift is detected.
The detection strategy for concept drift considers the concept models as representations of sets.Each set is equivalent to a subspace inside the hypersphere defined as the consequence of using Euclidean distance for TEDA.It enables using the Jaccard Index as a similarity measure between the concept models.
The Jaccard Index (JI) is a measure of similarity between sets.Equation ( 11) defines the JI for two arbitrary sets, A and B. Using JI as similarity measures for the concept models is an obvious solution considering that they represent datasets from the data stream.
Effectively, the two sets are similar as their intersection is closer to the union.They are identical if the intersection of the sets is equal to the sets union.Any two sets are distinct when there is no intersection.
The geometric shape in the feature space is the reference to estimate the intersection between the concept models.Ideally, the volume equivalent in the n-dimensional feature space would the estimated for the intersection and union.In this work, for simplicity, the radius and center of each concept model are used to represent the subspace delimiting the set.The center is analogous to the mean.The radius derives from the Chebyshev inequality as the maximum distance a data sample can be from the center in the Equation (12).
where r(n) is the radius of a given concept model at instant k, m is the sensitivity parameter, and σ 2 (n) is the variance.
Combining the fact that TEDA effectively defines hyper-spherical decision boundaries when using the Euclidean distance and Equation (8) it is possible to simplify the JI based on the radius of the hyper-spheres as in Equation (13).
This enables to detect a concept drift regardless the number of data samples.A concept drift occurs if the JI is lower than a given threshold, JT.The JT defines a limit of similarity between similar and dissimilar and the higher the value of JT more sensible to divergence is the CDD.A lower JT makes the CDD tolerate higher levels of divergence between reference model and evolving model.Possible values of JT are between 0 and 1. Considering the JI, a JT of 0.5 delimits that two models that share less the half of their data samples are dissimilar.
The main drawback is that the model should have a strategy to avoid nuisance while too few data samples are processed.In the other hand, the main advantage, is that if two models share a great number of samples before diverging in parameters, the decision is not weighted to not raise the alarm of concept drift.Figure 4 illustrates the concepts to devise the detection strategy.
This parameter is crucial for the detection.Ideally, it would be tuned used using an optimization search.It can also be adjusted dynamically using concept drift detection metrics [35].Unfortunately, it is out of the scope of this initial work with TEDA-CDD.13800 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

D. TEDA-CDD RESET
When a concept drift is detected TEDA-CDD is reset to start monitoring the current concept.To avoid a cold restart, the information on the evolving model is used to update the reference model while a new evolving model is created from scratch.It is achieved by implementing Equations ( 15), ( 16), and (14).
After the reset, the detection will not trigger, while the evolving model does not process enough new data samples.This behavior avoids nuisance detection due to noise and high variance in the early data samples.

E. OVERALL ALGORITHM
In Figure 5 is presented the flowchart of a data stream classifier for a generic CDD.Considering the application of TEDA-CDD, the steps of updating the reference and evolving model are performed in the Run CDD process.The detection strategy is applied in the CD Detected decision and TEDA-CDD is reset if a concept drift is detected at Reset CDD process.This flowchart connects each main component of TEDA-CDD to clearly present the whole functionality of the detector into a data stream classifier.In Algorithm 1, we also provided an algorithm in pseudocode implementing the proposed strategy.
Regarding memory usage, the proposed algorithm does not apply a time window in the same way as the comparison algorithms.TEDA-CDD only stores a fixed-size set of descriptors (n, µ, α, σ ) from the data stream, so for an n-dimensional data stream, it would use n sets of descriptors.As a result, the memory complexity is O(n).

IV. EXPERIMENTS
In this section, we discuss the experiments and results which indicate that TEDA-CDD is viable and efficient in terms of memory and performance.Initially, the methods used for comparison are presented.Following, the metrics and simulation strategy are discussed.Then, the sythetic datasets experiments are discussed in Subsection IV-A followed by real-world datasets discussed in Subsection IV-B.
The experiment simulates using a concept drift detector processing a data stream in real-time.Besides TEDA-CDD, ADWIN, KSWIN and Page-Hinkley are used to compare the performance.ADWIN maintains a variable-length window of recent data stream samples and detects drift by comparing the distributions of two sub-windows within the window [36].The KSWIN [37] concept drift detector uses a sliding window divided into two sub-windows: reference and detection windows.The r most recent data samples compose the Use Eq. ( 6) and ( 7) using µ, σ , n and x Use Eq. ( 6) and ( 7) using µ, σ , n F and x 19 else 20 Use Eq. ( 9) and ( 10) using µ, σ , α and x  [38] monitors CUMSUM metrics of each feature to measure the expected increase or decrease of feature value.For an increasing scenario, a drift is detected when the difference between expected increase and minimum expected increase exceeds a given threshold.All methods must process a minimum number of data samples prior concept drift test after each reset.The selected methods are unsupervised strategies to detect concept drift to ensure comparison fairness.Also, the River python package [39] has implementations of the selected algorithms.The parameters used for the benchmark methods are listed below according to the used library.• ADWIN: delta: 0.002 • KSWIN: alpha: 0.005 -window_size: 100 -stat_size: 30 • Page-Hinkley: -min_instances: 30 delta: 0.005 threshold: 50.0 alpha: 0.9999 mode: both The evaluation process basis are the ideas of Figures 2, 3 and 5.The Naive Bayes classifier is used as the model and trained with the pre-deploy data as an initial batch.Then the CDD and the classifier start processing the data stream.The sample storage stores the streaming data for future model updates.Whenever concept drift occurs, the model is updated using the data available in the sample storage, the sample storage dumps the previous data, and the concept drift detector resets.
In this experiment, CDD performance and model performance are related.When the CDD correctly detects concept drifts, the model does not suffer performance degradation due to frequent training or false alarms.Therefore, the model performance through the data stream is also a metric of the employed CDD performance.The experiment uses three strategies for performance measuring: prequential, sliding, and holdout [40].The prequential and sliding strategies measure the classifier performance, whereas the holdout provides a reference baseline.
There are two key points in using prequential scoring.First, it allows using all data stream samples as test and train samples.When a new data sample arrives, the model makes a prediction for scoring and then a model update using the data sample.Second, the most recent score is the accumulated score for all data samples processed, differently to holdout and sliding, which use a subset of the data stream samples.Equation (17) defines the estimation of the prequential accuracy at instant k of the data stream where the function 1(x) is the indicator function, ŷ is the predicted label, and y is the true label.
In sliding scoring, the performance metrics only consider the data samples into a sliding window of size w.It provides a performance score unbiased by previous data samples (data samples out of the sliding window).Equation ( 18) defines the estimation of the sliding accuracy for a window of size w at instant k.
The holdout scoring strategy divides the incoming data samples into two groups: train and test.The model updates using the train set and the test set to evaluate the model's performance.It provides a strong separation between train and test samples in contrast to prequential scoring but reduces the number of training samples.Similarly to the sliding scoring, the holdout scoring provides a performance score not biased by older data samples (data samples out of the current holdout batch).Equation (19) defines the estimation of the holdout accuracy of a given test batch, b, of size w b where ŷb i and y b i are, respectively, the i-th predicted label and i-th true label of the b-th test batch.
To use the holdout reference it is need to split the data into batches and for this the drift length of the synthetic datasets is used.In this context, the holdout accuracy was only calculated for the synthetic experiments.Since the classifier used for calculating the holdout accuracy uses the drift length as batch size the models are not presented with concept drift and therefore do not present performance degradation by concept drift or use a CDD.It is important to note that the holdout accuracy is not used to estimate the performance of the methods but to give a baseline for reference and is also expected that the holdout accuracy is consistently higher than the prequential or sliding accuracy.
In order to provide a meaningful reference for prequential accuracy, the holdout reference was accumulated.Accumulating the holdout accuracy makes it comparable with prequential accuracy in the sense of all previous data sample affect the current score.The accumulated holdout accuracy for the i-th batch uses all test subset from the first batch up to the i-th, inclusive.In this context, the accumulated holdout accuracy is also biased by older data samples presenting the global accuracy for data stream.
The last metric of interest of a CDD is the memory usage along the data stream processing measured in bytes (B).In this context, memory usage is measured at each data sample processing.Memory usage only considers the CDD since the models use the same classifier, and the stored data does not directly affect the detection.The less memory usage, the better.

A. SYNTHETIC DATASETS
The synthetic benchmark used is the Non-stationary Environments Archives (NEA) [41].It is a collection of non-stationary datasets.The majority is of synthetic data composed of moving Gaussian distributions.There are a variety of concept drift patterns that the majority can be considered incremental.
The NEA Benchmark is relevant because of three characteristics: extensive, complex, and determined.It is extensive as it presents various datasets with multiple behaviors.NEA expresses its complexity through the evolution and interaction between concepts and datasets dimensionality.And finally, it is well-defined as it describes the data and behavior patterns between concepts.
Each dataset presents a unique pattern and a drift duration as listed in Table 1.The drift duration is used to setup the first pre-deploy training of the classifier and to determine the holdout batches.In the pre-deploy data only half of the drift duration is used to build the batch whereas the remaining drift duration is used as online data.To determine the holdout batches the data stream is divided into segments with the drift duration length.Each segment is divided in half for training batch and testing batch (holdout batch) producing a score value for each segment.
The paramaters used for TEDA-CDD in this experiment are: m = 3, α = 0.9655 and JT = 0.93.The other methods use the default parameters from the library.
Figure 6 present the prequential accuracy for data-sets 2CDT and UG_2C_3D.The holdout reference shows the accuracy metric for a classifier with the knowledge of drift duration and therefore performs better than any methods and indicates the upper limit in performance.The remaining algorithms present consistent performance among them.In the case of dataset 2CDT, the algorithms had a significantly inferior performance than the reference, whereas for dataset UG_2C_3D all algorithms' performances were similar.The same behavior is noticeable in Figure 7, which show the sliding accuracy for datasets 2CDT and UG_2C_3D.The 2CDT presents a higher variance and a tendency of the compared methods to be averagely below the reference.The UG_2C_3D has a lower variance and a tendency to follow the reference.
These remarks are confirmed by Table 2.This table lists the final accuracy achieved by each method for all datasets.In bold is the highest accuracy value for the dataset disregarding the reference accuracy.In this experiment, TEDA-CDD and KSWIN have the highest accuracy in 7 datasets.This performance shows that TEDA-CDD is competitive in detecting concept drifts.
In terms of memory, TEDA-CDD presents a consistent superiority using less memory for all datasets.Although, it is important to highlight two points.First, ADWIN and KSWIN are window-based techniques, making this an unfair comparison.Second, the Page-Hinkley technique is unidirectional, only capable of detecting a concept drift that increases the values of a feature.Therefore, its implementation needs two instances to monitor a data stream feature.The second instance monitors the decrease in values of the data stream.

B. REAL-WORLD DATASETS
The real-world benchmark is composed by three datasets obtaining from the River package and one the the NEA.From River, the datasets credit_card, electricity and phishing are selected.From NEA, only real-world dataset provided is selected, 13804 VOLUME Authorized use limited the of the applicable license agreement with IEEE.apply.• credit_card is an anonymized and preprocessed dataset with data from credit card transactions.The goal of this dataset is to detect fraud.It has 28 PCA extracted features, a time index feature, the value of the transaction and target feature.The time index is not used since it presents a monotonic increasing feature and does not represent concepts directly.
• electricity is a dataset for an Australian Electricity market.The task of this dataset is to determine if the electricity cost wil increase or decrease.It has 8 features besides the target feature, two beign time related and, therefore, not used in the experiment.
• keystroke is a subset from a dataset from volunteers typing a password.It has 10 features and a 4 class target feature.This version of the dataset was obtained from NEA.
• phishing is a dataset composed from data of websites which are classified and phishing or not.It has 9 features and a two classes target feature.Table 4 presents the number of classes, features and samples of each dataset.
In this experiment the pre-deploy data is composed by 5% of the samples of each dataset.The paramaters used for TEDA-CDD are: m = 3.3, α = 0.9666 and JT = 0.85.The other methods use the default parameters from the library.
Since they are real-world datasets, it is unreasonable to consider a known periodicity or concept drift time instants.In this context, the evaluation based on the classifier performance from the synthetic scenario is valid.Table 5 lists the final accuracy of each method for each real-world dataset.Table 6 lists the mean memory usage in bytes for the real-world dataset experiment.Again, TEDA-CDD has similar performance in terms of accuracy and lower memory usage.
Using data from all experiments, we constructed a critical difference diagram for accuracy and one for memory use, Figures 10 and 11, respectively.We can extract from them   that TEDA-CDD performance is equivalent to PageHinkley and KWSIN in terms of accuracy and is the isolated best when considering memory usage.
A straightforward experiment illustrates the sensitivity of the parameters wherein Table 7 designates sets of values corresponding to each parameter.Subsequently, all datasets underwent processing and measuring of the overall accuracy for every parameter combination.The outcomes of this experimentation were employed to construct a parallel coordinates graph.The findings reveal a discernible level of parameter tolerance.Predominantly, the results manifest an accuracy hovering around 0.82, with infrequent instances exhibiting values below 0.8.Importantly, no distinct parameter range was identified as causing any decline in performance.
Finally, the accuracy and memory usage indicate that TEDA-CDD is efficient in terms of memory and accuracy for synthetic and real-world datasets.TEDA-CDD presents the same level of accuracy with much less memory consumption compared to KSWIN.Whereas Page-Hinkley performs closer to TEDA-CDD in terms of memory and accuracy-wise, the implementation complexity (managing two instances with 13806 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.data sample preprocessing) and lack of generality in the base algorithm subjectively place TEDA-CDD in a better position.

V. CONCLUSION
Our assumptions on data stream processing make our simulations closer to real machine learning applications.In this scenario, unsupervised concept drift detection enables early detection and safer handling.And in this scenario, TEDA-CDD presents a state-of-the-art performance comparable to a consistent method as KSWIN while having lower complexity (in terms of parameters and theoretical concepts) and lower memory usage.Therefore, TEDA-CDD is a competitive approach to concept drift detection.
The known limitations of TEDA-CDD are related to the use of Euclidean distance as a metric of similarity and the fully unsupervised approach.Use Euclidean distance forces spherical models that are conceptually simple because they depend heavily on the mean and variance parameters.The fully unsupervised approach ignores the offline setup of the data stream processing algorithm and online labeling.
In future work, we plan to expand TEDA-CDD to process multidimensional data with the same performance as the onedimensional approach.Using Mahalanobis distance makes it possible to create ellipsoidal models instead of spherical models to describe concepts.And in a parallel effort, we will propose a more realistic approach where exists known labels at training time and unknown at prediction time.In our understanding, this approach is semi-supervised and enables the modeling and monitoring of known concepts in an unsupervised data stream.Also, we intend to investigate the effects of dynamically adjust TEDA-CDD parameters in response to data stream changes.

FIGURE 10 .
FIGURE 10.Critical difference diagram for accuracy.

TABLE 2 .
Final accuracy of model.

TABLE 3 .
Mean memory usage in B (bytes).

TABLE 5 .
Final model accuracy on real-world datasets.

TABLE 6 .
Mean memory usage in B (bytes) on real-world datasets.