Meta-iCVI: Ensemble Validity Metrics for Concise Labeling of Correct, Under- or Over-Partitioning in Streaming Clustering

Understanding the performance and validity of clustering algorithms is both challenging and crucial, particularly when clustering must be done online. Until recently, most validation methods have relied on batch calculation and have required considerable human expertise in their interpretation. Improving real-time performance and interpretability of cluster validation, therefore, continues to be an important theme in unsupervised learning. Building upon previous work on incremental cluster validity indices (iCVIs), this paper introduces the Meta- iCVI as a tool for explainable and concise labeling of partition quality in online clustering. Leveraging a time-series classifier and data-fusion techniques, the Meta- iCVI combines the outputs of multiple iCVIs to produce a streaming label of either “over”, “under”, or “correctly” partitioned. Experiments were conducted on generalized synthetic and real-world data sets to demonstrate the efficacy and application of this method. Results of 100% accuracy were achieved in labeling partition quality on real-world data sets including MNIST and FLIR ADAS, demonstrating that the Meta- iCVI is a powerful and efficient tool for classifying partition quality in a variety of conditions. Its introduction should empower new and more efficient streaming clustering techniques. Additionally, we believe this to be the first implementation of an ensemble iCVI metric and the first time iCVI validation performance has been evaluated on randomized sample presentation.


I. INTRODUCTION
Cluster validation is a fundamental topic in unsupervised learning [1], [2].In the case when no class labels are given, partition quality must be assessed by the practitioner, often offline.Different algorithms, sample presentations, or hyper parameters can lead to different partition results.These varying results may be disambiguated with the use of Cluster Validity Indices (CVIs), a class of objective metrics for partition quality.Among CVIs there are two primary types: external CVIs which compare the test partition with some reference partition (typically the ground truth), and internal CVIs, which evaluate the partition using information contained in the data and the structure of the partition.Algorithms to compute CVIs usually relied on batch calculation.Recently, Incremental CVIs (iCVIs), have been developed to allow online computation [3].These algorithms are based on batch CVIs but rely on caching mechanisms to eliminate unnecessary recalculation between sample presentations.This class of algorithms was extended in [4] by introducing six new iCVIs based on existing CVI algorithms.In that work, we investigated the behavior of each existing and newly introduced iCVI and then manually evaluated whether the behavior was informative in determining partition quality in the over and underpartitioned cases.The step of manual evaluation is, however, far from precise.A subjective interpretation is required to be performed by an expert in order to understand or make use of these metrics.Furthermore, each individual iCVI was shown to behave differently when exposed to under, over, or correct partitioning with varying degrees of sensitivity to each [4].This requires a practitioner to be familiar with the quirks and biases of the specific iCVI independent of the data they are analyzing.It is this design challenge of CVIs, particularly from the perspective of the uninitiated, that motivated the current work.
Here we focus on providing a simple and objective interpretation of these iCVIs; Presenting a novel algorithm based on the fusion of multiple iCVIs with a trained time-series classification element.This classification step conducts an objective evaluation of the fused iCVIs and predicts the partition quality in an explainable manner.Each of the top performing iCVIs are evaluated for their performance in this fusion ensemble and the subsequent amenability to classification.To the best of the authors' knowledge, this is the first implementation of an ensemble iCVI with or without a classification step.

II. BACKGROUND AND RELATED WORK A. CLUSTERING
Clustering is an unsupervised learning technique wherein a data set X = {x i } N i=1 is partitioned = {ω i } k i=1 into k clusters ω i depending on the features of each sample x i .When clustering is done incrementally rather than in a batch, it is sometimes also known as streaming clustering or online clustering.Clustering itself is a broad field of machine learning with a myriad of techniques ranging from hierarchical and density-based methods [2] to ensemble variants thereof; [5] and [6] demonstrate methods of taking advantage of information sharing in ensemble streaming clustering, especially in multi-view clustering applications, while mitigating the introduced computational complexity of running many simultaneous clustering processes.
The greatest question that arises, however, once a clustering solution is a consequence of the unsupervised clustering problem statement: in the absence of ground truth cluster labels, how does one know how well or ''correctly'' a clustering algorithm has performed?Ideally, samples are more similar to those within the same cluster than to samples in outside clusters.While clustering algorithms provide a framework for partitioning data, there often exists little intrinsic reference to the quality of such partitions.It is this purpose that Cluster Validity Indices (CVIs) serve and thus allow the practitioner to validate the result of a clustering algorithm.By far the most common and most developed techniques for cluster validation are those for batch cluster validation.An extensive background on these can be found in [7] and [8] and Chapter 10 of [2].Batch CVIs have been used extensively for post-processing, monitoring, as well as for fitness functions in optimization problems.References [9] and [10] utilized batch CVIs for optimization purposes in particle swarm optimization problems.Reference [11] developed a CVI that uses a correlation metric for determining cluster similarity.And [12] proposed a CVI for non-negative spaces which performs well for big-data applications.
An approach developed for waveform clustering concludes similarity metrics can be informative for time-series clustering; echoing our own method of utilizing correlation [13] on CVI waveforms.The same study also leveraged visual observation to identify the usefulness of various similarity metrics similar to the approach taken in [14].
Recently, a family of CVIs has been introduced for incremental use known as iCVIs [3], [15], [16], [17].References [3] and [15] introduced and evaluated the incremental Davies-Bouldin (iDB) and Xie-Beni (iXB) indices for their applicability to accurately and poorly partitioned data sets while [16] and [17] investigated the behavior of iDB in cases in which the clustering methods accurately detected structures present in the data.Reference [4] extended this family of iCVIs to include six additional versions.In that work, it was found that certain iCVIs are visually more effective at identifying certain partition conditions.For instance, the incremental versions of the Calinski-Harabasz (iCH), Silhouette (iSIL), iXB, Generalized Dunn's (iGD), and Partition Separation (iPS) indices were found to be effective at identifying the under-partitioned condition.Similarly, iSIL, iXB, and iDB indices were found to be effective at identifying the over-partitioned condition.When considering real-world data in particular, iPS was most effective for identifying the under-partitioned condition while iXB and iDB were most effective for identifying the overpartitioned condition.These results support the notion that no single iCVI (or CVI) can be effective in all situations [18].Indeed, methods have been proposed which focus solely FIGURE 1. Schematic of the Meta iCVI.On the upper left, streaming data is processed one sample at a time and on the upper right, streaming labels are generated by the user's preferred clustering method.Both the streaming samples and labels are processed simultaneously by two separate iCVIs, iCVI A and iCVI B .The windowed correlation of the iCVIs is then calculated and appended to a correlation history.The correlation history is then labeled by a trained time-series classifier into the labels ''over'', ''under'', and ''correct'' (partitioned).
on the proper selection of a CVI for optimal results on a particular data set [19].However, an extension of [4] found that a single iCVI can serve as a vigilance mechanism for Fuzzy ART [20] as well as for TopoARTMAP [21].The former study investigated iCH, iWB, iXB, iDB, incremental Pakhira-Bandyopadhyay-Maulik (iPBM), and incremental Negentropy Increment (iNI).It was found that for the purpose of an ART vigilance mechanism, iNI functioned best.
We also emphasize that, while this work does explore an ensemble metric for clustering, it is in no way an ensemble clustering method.Similarly, our method contains a time-series classification step, but it is not a time-series classification method.This metric is agnostic to the base clustering methods.While further exposition on it is beyond our scope, we refer the interested reader to [22] and [23].

III. METHODOLOGY A. SELECTION OF ICVIS
Our previous work [4] showed the effectiveness of several iCVIs at visually indicating the two primary partition states (over/under).These iCVIs include iCH, iSIL, iXB, iGD, iPS, and iDB.Under proper conditions, each of these iCVIs demonstrated human-distinguishable graphical behavior that differentiate either the under-or over-partitioned conditions from the correctly-partitioned conditions in a variety of synthetic and real-world data sets.As expected, no single iCVI was found to be superior in at identifying both the overand under-partitioned conditions; particularly for real-world data.This paper investigates the potential for each of these iCVIs to be used in pairings whereby the two iCVIs are fused and then evaluated for behavior indicating the partition state.With six iCVIs, this implies 15 candidate pairs of iCVI to be evaluated for fusion.

B. FUSION OF ICVIS
This method involves the fusion of two iCVIs into a single metric.This is accomplished by means of calculating Spearman's correlation coefficient within a sliding window over the two iCVIs.Correlation was chosen as it provides a means of looking at the discrepancy in behavior in the two iCVIs.Additionally, calculating the correlation normalizes the value between −1 and 1 which is important in later steps.
Spearman's method computes the correlation between the rank values of the two iCVI values.When compared to a linear correlation like Pearson's correlation, Spearman's correlation provides more flexibility for the varying range of values that the iCVIs exhibit.Spearman's correlation is capable of finding non-linear correlations on various scales as it is computed by finding the Pearson's correlation coefficient for the ranked values of the data, thus improving for monotonic rather than strictly linear relationships.The formulation for calculating Spearman's correlation in a window of size p at time-step t is given in Equation 1, as shown at the bottom of the page, where R denotes the rank of the two variables x or y, revealing that the coefficient computes the covariance of ranks over the multiplied standard deviations of the the individual ranks.The correlation is computed here within a sliding window in order to maintain the incremental behavior of the iCVIs and to localize the correlation coefficient to more recent behavior.These experiments revealed a window of approximately 2/3 the mean cluster size or larger resulted in the most consistent results for the fusion technique.Further details of this experiment can be found in Section V-B2.Larger windows increase the duration of the initialization period before the Meta-iCVI method can be evaluated.On the other hand, shorter windows may limit the scope of the window too much to see larger iCVI behavioral patterns.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Following the fusion of two iCVIs, a classification step is taken to predict the partition quality.When calculating the iCVI values after each sample presentation, an effective iCVI time-series is generated describing the clustering performance after each presentation.Prior to fusion, the time-series has two features, one for each iCVI, with a total length of o.However, when correlation between the two iCVI values is computed in the sliding window of size p, the time-series reduces to a single feature with length o−p+1.This new, one-dimensional time-series is composed of the incremental windowed-correlation values; where each feature of the series is the correlation of the two iCVIs within a recent time window.With this new time-series, one can now apply a linear classifier to identify the state of partition quality.The classifier used in this work is the ROCKET method paired with a Ridge regression classifier [24], [25], [26].ROCKET is a method for classifying time series by first transforming the data with a set of random kernels and then classifying with a linear classifier in the vein of random basis methods [27].The ROCKET method was selected for the exceptionally fast and accurate results demonstrated by the creators, a result which was also seen in this paper.
Additionally, an open-source version of this classifier can be found in the current sktime Python package, lending itself to ease of implementation and result verification.While several other methods were evaluated including convolutional neural networks, spectral methods, and shapelets, ROCKET was not only the best performing but was by far the fastest.Nonetheless, any classifier fitting the needs of the practitioner might be substituted for this step.
This classification approach translates the string of individual iCVI values into simple, explainable categories: ''over,'' ''under,'' and ''correctly'' partitioned.The classifier is a supervised learning system and thus must be trained prior to use.In this paper, we assessed multiple approaches to classifier training including using training sets unrelated to testing sets as well as more traditional approaches of using training/testing splits on the same data set.These findings indicate that either method is an acceptable approach and show that the iCVI classifier can be trained not only on a subset of the same data, but on related data.Therefore, the classifier is well-suited for online use-cases.
Classification can be performed during or after data partitioning.This allows for an incremental approach where parameters can be adjusted during clustering, leading to improved final clustering results.Alternatively, classification can be performed after clustering is complete to gauge the accuracy of clustering method after completion.Due to the computational cost of calculating Spearman's coefficient, O(p log p) when considering sorting and where p is the width of the correlation sliding window, the practitioner may choose to postpone calculation of the coefficient and the classification step until time allows.A detailed discussion of incremental implementation and cost can be found in Section VI.

D. STEP-BY-STEP
This subsection will briefly walk through all the steps outlined above for clarity.
We begin with an arbitrary stream of input data.This data is clustered by some arbitrary clustering method resulting in a stream of sample-prediction pairs.These sampleprediction pairs are then processed by two different iCVIs.
The particular iCVIs used is not important for the scope of this process outline and so they will just be called iCVI-A and iCVI-B.Each iCVI will produce a single-number metric for each sample in the stream.As the iCVIs are, of course, incremental, the iCVI metric will be more accurate for each successive sample provided.This results in a 4-D time-series composed of: the original samples, the cluster predictions, the iCVI-A metrics, and the iCVI-B metrics.At this stage, however, we are only concerned with the iCVI features of the time series and the rest can be discarded, resulting in a 2-D time-series.This 2-D time series of iCVI metrics is then reduced to a 1-D time series by computing the moving-window Spearman correlation between the two features.Finally, this derived 1-D time-series can be fed to the pre-trained ROCKET classifier for partition quality classification.The entire process can be calculated incrementally so long as the window size is less than or equal to the sample feature cardinality.
The pre-training of the ROCKET classifier does require the above process be performed on a set of sample-partition pairs with ground-truth partition quality labels.Such data can be generated automatically either by using a method like k-means to cluster training data with a value k substantially higher or lower than the ground-truth partition size, or by otherwise splitting and merging the ground-truth partitions artificially.Our findings suggest that the classifier is robust in terms of training data so long as the training data has a similar distribution to that of the testing data.

IV. DATA A. GENERATION OF SYNTHETIC DATA
Initial experiments focused on a data set of synthetic partition examples.1064 data sets of Gaussian blobs were generated containing a random number of features ranging within 1 − 20, with a random number of clusters ranging within 2 − 30, with a random number of samples-per-cluster ranging within 20 − 100.This data was then partitioned using k-means [28] to generate up to 5 under-partitions and up to 5 over-partitions, which was done by seeding a number k greater or less than the ''true,'' known number of clusters.In addition to the correct partitions, these over-and under-partitioned data sets were compiled into an overarching partition data set of which ''samples'' were the sets of data point and cluster label pairs while the labels were one of either ''over,'' ''under,'' or ''correct,'' referencing the partition quality.This data set was split 50/50 into training and testing sets.

TABLE 1. Selection and properties of the real-world data sets used in our analysis.
Each of these data sets exist as typical baseline clustering data sets and were not preprocessed in any way.K-means was used to create up to 20 examples of over and under-partitions for each data set.These incorrect partitions as well as 10 variations of the the correct partitions were combined to form the real-world data set, which was then split 70/30 into training and testing sets as well as 50/50 training and testing sets.

V. TRAINING AND EXPERIMENTAL PROCEDURE A. ICVI COMPUTATION AND CLASSIFIER TRAINING
The iCVIs are individually calculated on each datapoint/cluster-label pairing prior to fusion and classifier presentation.For the purposes of these experiments, the iCVIs are calculated incrementally and the fusion and classification step was completed in batch, thus classifying the entirety of the data/label pairing.However, both steps can be computed incrementally without a difference in final results.An explicit discussion of incremental implementation and the associated time complexity is given in Section VI.
The ROCKET algorithm generates random kernels for feature extraction and then transforms the fused iCVI timeseries of the training sets.After transformation, a Ridge regression classifier is trained to classify these features into one of three categories: ''under,'' ''over,'', and ''correctly'' partitioned.Classifier training occurs in seconds on a standard desktop machine, even for data sets such as MNIST and FLIR ADAS.

B. EXPERIMENTAL RESULTS AND DISCUSSION
We conducted a set of seven primary experiments on real and synthetic data in order to evaluate the Meta-iCVI performance.

1) EVALUATION OF OPTIMAL ICVI PAIRINGS ON SYNTHETIC DATA
Our previous work found that six iCVIs generated graphical behavior descriptive of cluster partition states [4].These iCVIs include iCH, iSIL, iXB, iGD, iPS, and iDB; each of which was capable of identifying either the over or under-partition conditions in the majority of the data sets tested.In order to evaluate the optimal pairing of iCVIs to fuse, all 15 unique permutations of these six iCVIs were considered, and their performance was evaluated in the Meta-iCVI framework on synthetic data.These evaluations 11118 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.involved 1064 randomly generated data sets, each partitioned in either an over, under, or correct fashion.Over and under partitioning was accomplished with k-means.This set of partitioned data sets was then divided into a 50/50 training/testing split.The results of these evaluations are given in Table 8 at the end of this paper.Of the 15 pairings tested, the combination of iCH and iSIL performed the best with an overall accuracy of 95.6%.We theorize that their sensitivities to different partition conditions, as seen in [4], may complement each other.All further Meta-iCVI experiments are performed with this combination of iCVIs.

2) EVALUATION OF OPTIMAL ICVI WINDOW SIZE
In order to evaluate the effect of window size, we performed a sweep of possible window sizes from 10% to 90% of the mean cluster size for the synthetic data sets.We found strong resilience to changes in window size, however, the most stable performance appears to be for window sizes of 60% or larger.All other experiments were performed with a window size equal to 60% of the mean cluster size of the training data.

3) EVALUATION OF META-ICVIS TRAINED ON SYNTHETIC DATA APPLIED TO REAL-WORLD DATA
The ability of Meta-iCVIs to generalize from synthetic to real-world data was also tested using an experiment where the Meta-iCVI was trained on the same synthetic data outlined above and evaluated on real-world data.The real-world data tested here included the MNIST handwritten digits, ISOLET, Wine, Leaf, Cervical, and Iris data sets.Over and under partitions were generated using k-means, and correct partitions are generated by shuffling the samples and the corresponding true labels.The results were poor with an accuracy of only 36.8%.Therefore, priming the Meta-iCVI with synthetic data for use on real-world data sets is not recommended.

4) EVALUATION OF META-ICVIS ON REAL-WORLD DATA
With strong results on synthetic data, we were interested in investigating the performance of the Meta-iCVI trained on real-world data.In this experiment, each individual real-world data set is split into a training and testing set.It should be noted that during training, the correlation window size of the Meta-iCVI is set dynamically to 60% the length of the mean cluster size.For this reason, training data with training/testing splits other than 50/50 result in sub-optimal results due to different cluster sizes in the training and testing data.To elucidate the impact, we conduct the same experiment with both a 70/30 split and a 50/50 split.The results for each real-world data set are given in Tables 2 and 3.

5) EVALUATION OF META-ICVIS TRAINED AND TESTED ON DIFFERENT REAL-WORLD DATA SETS
This experiment tested the generalizing ability of Meta-iCVIs.Similarly to the experiment in Section V-B2, the Meta-iCVI was trained on a set of data independent to the testing data, except in this case, both sets were collections of real-world data.The purpose of this experiment was to determine if the Meta-iCVI can be trained with an independently sourced data set of similar distribution and still TABLE 3. Results of 50/50 training/testing split on real-world data.Each sub-table is a confusion matrix for a particular data set.An ideal result would be a diagonal matrix, which is seen in all tables except those for ISOLET (b) and wine (c).The cervical data set in sub-table (e) only has two true clusters and, because iCVIs do not return results for singleton clusters, no under-clustering examples could be provided.

TABLE 4.
Combined results of training with various real-world data sets and testing on different real-world data sets.Results are presented as a confusion matrix where an ideal result would be a diagonal matrix.achieve competitive performance.The overall accuracy was 88.3% with a training/testing split of 50/50 drawn from a collection of 256 partitions of the real-world data sets.The confusion matrix is given in Table 4.These results confirm the efficacy of the Meta-iCVI when trained and tested on realworld data.

6) EVALUATION OF META-ICVIS TRAINED AND TESTED ON REAL-WORLD DATA SETS WITH RANDOMIZED ORDERING
In all previous experiments, data samples were presented in a sorted ordering based on their cluster labels.In all previous experiments, data samples were presented in a TABLE 5. Results of 50/50 training/testing split on real-world data with randomized ordering.Each sub-table is a confusion matrix for a particular data set.An ideal result would be a diagonal matrix, which is seen only in tables (d) Leaf and (e) Cervical.The cervical data set in Sub-Table (e) only has two true clusters and, because iCVIs do not return results for singleton clusters, no under-clustering examples could be provided.Note that while results are still very good, the sorted-ordering does lead to better performance in many cases.
sorted ordering based on their cluster labels.The rationale behind this is to allow comparisons with published works such as [4], which only evaluated performance of iCVIs for the condition of sorted ordering.In this experiment, the ordering of the presented samples are randomized, to examine the robustness of this method to the noisier orderings that would be seen in streaming conditions.This experiment is a repeat of Section V-B4 with the exception of the presentation order being random.Results for this experiment are given in Table 5.These results demonstrate a high-degree of robustness to randomized sample presentation and confirm the efficacy Meta-iCVIs for natural or noisy data ordering.

7) EVALUATION OF META-ICVIS TRAINED AND TESTED ON DIFFERENT REAL-WORLD DATA SETS WITH RANDOMIZED ORDERING
This experiment again evaluates the resilience of Meta-iCVIs to noisy sample presentation.The experiment from V-B5 is 11120 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.repeated with the only difference being that sample order has been randomized rather than sorted by label.Results of this experiment are given in Table 6, confirming that Meta-iCVIs are resilient to randomized sample presentation and can perform competitively compared even to the sorted order condition.These results confirm the efficacy of Meta-iCVIs for evaluation of natural ordered or noisy data when primed with like-data of similar distribution.

VI. INCREMENTAL IMPLEMENTATION
While the Meta-iCVI can be readily applied in a batch mode, special considerations should be considered when implementing in incremental mode.As mentioned in Section III-C, the time complexity of evaluating the Meta-iCVI is limited by the rank ordering component and is on the order of O(n w log n w ) per evaluation.When evaluated incrementally, however, the time complexity for an entire data set becomes O(n s n w log n w ).As n w , the sliding window size, is assumed to be some fraction of n s , the length of the time-series, we can reduce our time complexity for the whole data set to O(n 2 s log n s ).For this reason, we recommend only evaluating the Meta-iCVI occasionally and interpolating results inbetween.
As implied in [4], iCVIs behavior is best defined for sorted-order presentations, that is, when the cluster labels are presented in a strictly monotonically increasing fashion.While our experiments show a robustness to randomized presentation order, the Meta-iCVI still performs best for sorted ordering presentations.This could be accomplished by storing the samples/label pairs covered by the correlation window and dynamically sorting them according to their label.While this will require additional memory, the time-complexity will remain unchanged as a sorting method is already required for rank-ordering.
As a result of the ROCKET kernel transformation, the fused-iCVI time-series has a required length as set during training.Fixing the length of the series can be accomplished by windowing the time-series to a limited number of past events, and by padding the series with zeros when the duration is not long enough.Both of these methods were used in our experiments without negative effects on results.
While our results did show significant resilience to multiple window sizes, best results were found when the window size was between 30% and 90% of the mean cluster size.Should a data set be particularly unbalanced, dynamic  7. Software packages produced for reproducibility.Packages implementing all CVIs listed here were developed in both Python and Julia which were then released on PyPi and JuliaHub, respectively.Corresponding packages were also developed upon these two that implement the Meta-ICVI method outlined in this paper.
under/over sampling may be required in order to optimize results.
As a result, the authors recommend storing n w samples and labels in memory, dynamically re-sampling as needed, padding with zeros as needed, and evaluating the Meta-iCVIs only periodically in applications with poorly conditioned datasets.
For a demonstration of the incremental application, Figure 3 gives three examples of the Meta-iCVI evaluated incrementally for the over, under, and correctly partitioned TABLE 8. Results of testing all permutations of selected iCVIs on synthetic data.Each sub-table is a confusion matrix corresponding to one permutation of iCVI pairings where an ideal result is a diagonal matrix.Sub-table 8o is closest to a diagonal and thus the iSIL/iCH pairing is the best performing.
cases of the Iris data set.The red line is the result of the windowed correlation between the two iCVIs and the green line shows the prediction of the Meta-iCVI as −1, 0, or +1 for under, correct, and over partitioned respectively.It can be seen from these plots that the Meta-iCVI does show initial inconsistencies when fewer classes or samples have been presented but the prediction quickly becomes steady and accurate as the number of samples increases.

VII. REPRODUCIBILITY
Through the development of this investigation, several software implementations in different programming languages are now made available that implement iCVIs and the Meta-iCVI method.A Python implementation of batch and incremental CVIs is published at PyPI as cvi [38], and the Python Meta-iCVI method is available on the Missouri S&T self-hosted GitLab instance [39].The Meta-iCVI method is also implemented in a Julia package [40], which is implemented atop a registered JuliaHub package for all batch and incremental CVI variants listed in this article [41].Table 7 arranges each implementation of CVIs and the Meta-iCVI method in both programming languages.

VIII. CONCLUSION
This paper underscores that there is no one-size-fits-all solution when selecting iCVIs, however, our experimental results show that Meta-iCVIs are a powerful tool for providing insight into clustering performance when they are tuned properly.On several real-world data sets, including the MNIST handwritten digits dataset and the FLIR ADAS dataset, 100% accuracy was seen in classifying a partition as either ''under,'' ''over,'' or ''correctly'' partitioned.The simplicity of this labeling contrasts sharply with existing iCVI and CVI approaches which merely provide a relative indication of partition quality when compared to other partitions on the same data.While this method does require supervised training and a labeled data set to initialize, the presented experiments have shown a resilience to not only to training on data independent to that which the Meta-iCVI is applied but also to randomized sample presentation.Such results grant a high degree of flexibility to the practitioner in implementation.Unfortunately, the labeled data is required share similar characteristics to the target data and should not be synthetically generated.This may complicate extremely novel or specific applications.If a suitably similar data set can be found with which to initialize the Meta-iCVI, the actual data set of interest need not be labeled prior to evaluation.This method provides a novel tool for practitioners to use for evaluating the performance of their clustering methods and may also serve as a base for future streaming clustering algorithms to be built upon.
Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of these agencies.The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.

Algorithm 1
Incremental Meta-iCVI Algorithm.This Algorithm Outlines the Inference Mode for Computing the Meta-iCVI Algorithm Upon an Existing Numerical Clustering Solution With Original Samples X and Prescribed Cluster Labels .The Pretraining Stage Follows the Inference Procedure up to the Computing of the ROCKET Features of the Windowed iCVI Correlations; These Features Are Then Used to Train the Ridge Regression Classifier Weights in the Normal Manner for Use During Inference.Data: Samples X ∈ R m×n , cluster labels ∈ N n Result: Partition qualities Y ∈ [over, under, correct] / * Hyperparameters * / m: sample feature dimension.n: number of samples.o: width of CVI FIFO window.p: width of correlation FIFO window.q: number of rocket kernels./ * Initialization * / FIFO

FIGURE 2 .
FIGURE 2. Accuracy of the classifier as a function of window size.Window size is relative to mean cluster size.Experiments were conducted on synthetic data.

FIGURE 3 .
FIGURE 3. Incremental evaluations on the Iris data set showing incremental predictions for partition quality over sample number.iCVI windowed correlation is plotted in red alongside classifier prediction in green.Classifier prediction is given as either −1, 0, or +1 which indicate ''under'', ''correct'', and ''over'' partitioned conditions, respectively.Partition breaks are indicated by the blue vertical dashed lines.Initial uncertainty can be seen in the jumping between prediction states as the first samples are considered; however, the prediction settles as additional samples are presented.

TABLE 2 .
Results of 70/30 training/testing split on real-world data.Each sub-table is a confusion matrix for a particular data set.An ideal result would be a diagonal matrix.The cervical data set in sub-table (e) only has two true clusters and, because iCVIs do not return results for singleton clusters, no under-clustering examples could be provided.

TABLE 6 .
Combined results of training with various real-world data sets with randomized sample presentation and testing on different real-world data sets also with randomized sample presentation.Results are presented as a confusion matrix where an ideal result would be a diagonal matrix.