Extending Contrastive Learning to Unsupervised Coreset Selection

Self-supervised contrastive learning offers a means of learning informative features from a pool of unlabeled data. In this paper, we investigate another useful approach. We propose an entirely unlabeled coreset selection method. In this regard, contrastive learning, one of several self-supervised methods, was recently proposed and has consistently delivered the highest performance. This prompted us to choose two leading methods for contrastive learning: the simple framework for contrastive learning of visual representations (SimCLR) and the momentum contrastive (MoCo) learning framework. We calculated the cosine similarities for each example of an epoch for the entire duration of the contrastive learning process and subsequently accumulated the cosine similarity values to obtain the coreset score. Our assumption was that a sample with low similarity would likely behave as a coreset. Compared with existing coreset selection methods with labels, our approach reduced the cost associated with human annotation. In this study, the unsupervised method implemented for coreset selection achieved improvements of 1.25% (for CIFAR10), 0.82% (for SVHN), and 0.19% (for QMNIST) over a randomly selected subset with a size of 30%. Furthermore, our results are comparable to those of the existing supervised coreset selection methods. The differences between the proposed and the above mentioned supervised coreset selection method (forgetting events) were 0.81% on the CIFAR10 dataset, −2.08% on the SVHN dataset (the proposed method outperformed the existing method), and 0.01% on the QMNIST dataset at a subset size of 30%. In addition, our proposed approach exhibited robustness even if the coreset selection model and target model were not identical (e.g., using ResNet18 as a selection model and ResNet101 as the target model). Lastly, we obtained more concrete proof that our coreset examples are highly informative by showing the performance gap between the coreset and non-coreset samples in the coreset cross test experiment. We observed a pair of performance ((testing: non-coreset, training: coreset), (testing: coreset, training: non-coreset)), i.e. (94.27%, 67.39 %) for CIFAR10, (98.24%, 83.30%) for SVHN, and (99.89%, 93.07%) for QMNIST with a subset size of 30%.


Introduction
Deep learning-based methods have been highly effective in performing computer vision tasks such as image classification [19], object detection [16], and semantic segmentation [8].However, these methods generally require large amounts of data to produce accurate results; in particular, human annotation, an essential part of supervised learning, can be considerably time consuming and costly to implement.To address this problem, we selected a subset of the entire training set: given a training set D, we selected a subset D ⊂ D. This process requires identifying useful sam-Figure 1.Our coreset based on SimCLR performance on three datasets (CIFAR10, SVHN, QMNIST).The horizontal line depicts the stride as a percentage of the training set size.The dotted and solid lines represent the test accuracy of the classification network trained with a randomly chosen subset and our coreset, respectively.The size of both subsets is 30% of each dataset, i.e., 15,000, 22,500, and 18,000 samples for CIFAR10, SVHN, and QMNIST, respectively.At a stride of 0%, our coreset achieves high performance over a random subset.As the stride moves to the right, the resulting accuracy decreases.This signifies that the corresponding examples exhibit a low coreset score.We use ResNet18 architecture for both the coreset selection and classification.
ples that can collectively represent the entire dataset.In this case, the underlying assumption is that not all samples contribute equally to a given task.In other words, each sample has a different level of contribution to the task-mostly referred to in the literature as high-contribution examples of the coreset.This assumption is a necessary improvement over a randomly selected subset (This topic has been thoroughly reviewed [27], [4].).These results can be divided into two groups: supervised and unsupervised coresets (described in detail in Section 2).
One of the limitations of employing supervised coresets is that they do not reduce the annotation cost bei arXiv:2103.03574v1[cs.LG] 5 Mar 2021 cause they require a fully labeled dataset to execute successfully.Nonetheless, supervised coresets have been extensively studied, whereas unsupervised coresets have not received sufficient attention.Therefore, we aimed to select a coreset without human intervention, as is often the case when labels are not available.To the best of our knowledge, only Valvano et al. [30] have addressed the problem of unsupervised coreset selection.However, their method does not function desirably with large datasets.Moreover, their results were declared not robust; further research would be required to prove the usability of this method.Through this study, we aimed to perform unsupervised coreset selection using a deep learning approach.It must be noted that a coreset provides more information about the parent dataset than a non-coreset does to complete the target task (e.g., classification).Therefore, it is reasonable to suppose that information-related metrics and target tasks are required to enable coreset selection.Because it is impractical to build a target task in the absence of a label, we adopted an alternative method of building a task, namely, self-supervised learning.
Self-supervised learning has arisen for processing unlabeled data.With the objective of learning useful features from a pool of unlabeled data, self-supervised learning generally demands the description of a pretext task within the loss function; good performance depends on the definition of the pretext task.Hence, several researchers have attempted to devise a pretext task to enhance downstream task performance.However, it was recently reported that contrastive learning offers improved performance over existing pretext tasks.As its name implies, intermediate representations of the deep model are created to maximize the cossim(cosine similarity) between different views of the same example.As it is a state-of-the-art method, we decided to focus on contrastive learning in our study.Moreover, with respect to information-related metrics, we conjectured that the cossim would be a valid metric for coreset selection.This is because we assumed that the coreset would exhibit a low cossim value (in Section 3).As depicted in Fig. 2, we stored the negative accumulated cossim values for each example at the end of the epoch.We assigned the saved cossim value as the coreset score and sorted the examples according to their scores.Finally, we chose a final coreset with a specified size.Subsequently, we compared the classification accuracy of the model trained from our coreset examples and from the randomly chosen examples separately by assigning equal sizes to the subsets.The empirical results showed that the accuracy of our coreset was more accurate than that of the random subsets (Section 4).Notably, a low cossim(=high core-set score) value means that its examples is not easy for contrastive learning to predict compared with other examples with large cossim values.Therefore, we inferred that the particular coreset has a characteristic that ex-hibits a low cossim value.This empirical evidence is in line with forgetting events [27]-an event where the sample is wrongly classified during training.In a previous study, it was discovered that an example with a large number of forgetting events had a high level of contribution in supervised learning.The opposite is also true, i.e., the lower the number of forgetting events, the smaller the contribution of the corresponding example).Our work bridges the gap between contrastive learning and coreset selection and provides a new direction for further research.In this paper, we demonstrate the possibility of transferring an abstract space built from contrastive learning to identify coreset examples.Our contributions to the literature are summarized as follows: • To the best of our knowledge, our study is the first of its kind to substantiate that self-supervised learning is a highly suitable method for selecting a subset to generalize deep neural networks.
• To measure the information of an example, we establish a coreset score as a function of cossim.
Our work is expected to provide guidance on the use of unsupervised coresets.Our code can be accessed from the following website:1 .

Related Work
In our study, we bridged the gap between contrastive learning and unsupervised coreset selection.To establish context, we provide a brief history of self-supervision and revisit previous studies on coresets.
Self-supervised learning Self-supervised learning aims to define a pretext task to enable a model to learn useful features without the presence of labeled data.In this case, downstream tasks (classification, detection, etc.) serve as a tool to demonstrate the significance of the pretext task.In the last decade, numerous researchers have reported that their pretext tasks were more effective in downstream tasks.The major works are as follows: Context prediction [5] assigns a task to a model to predict the position of a patch relative to that of the center-cropped patch.Solving a jigsaw puzzle [24] recasts a self-supervised problem by learning the permutation index of cropped patches.The counting task [25] is based on the concept that the sum of visual primitives from each patch must be equal to that of an entire image.
The above-mentioned studies were mainly based on the use of image patches.Unlike this stream, rotation [7], PIRL [21], MoCo v1 [12], SimCLR [2], and MoCo v2 [3] used an entire image as input.The prediction angle at which an image is rotated is technically sounds and straightforward [7].Defining positive and negative pairs and maximizing the cossim value of positive pairs are key factors in contrastive learning, as mentioned in the MoCo v1-2 [12] [3] and SimCLR [2] papers.Both SimCLR and MoCo describe a positive pair as two augmented versions of the same example.SimCLR realizes contrastive learning via a multilayer perceptron (MLP) projection network, heavy data augmentation, and layer-wise adaptive rate scaling (LARS) optimizer [35].On the other hand, MoCo mainly relies on a building dictionary equipped with a momentum encoder.
Coreset Selection A study on coresets was conducted with the goal of achieving data-processing efficiency with a given label (i.e., achieving faster learning and consuming lesser storage sapce) Here, the term coreset refers to the most informative set of examples identified.To aid the reader's understanding, we categorized existing studies into four groups depending on the feature type and the presence or absence of a label: i) hand-crafted feature + label: Bayesian inference [1], SVM [28], Bayesian logistic regression [15], submodular functions [29] [32] [31], k-means and k-median clustering [11] [10], Gaussian mixture models [20], optimization framework [6], ii) hand-crafted feature + no label: Submodular optimization for speech data [23], iii) learnable feature + label: forgetting events [27], selection via proxy(SVP) [4].iv) learnable feature + no label: Unsupervised data selection [30]-our work falls into this category.
In particular, a few attempts focusing on the application of deep neural networks (DNNs) were made until Toneva et al. [27] discovered a correlation between coresets and forgetting events.Furthermore, Coleman et al. [4] extended this work by employing a proxy model for selection.Here, the target model was not necessarily identical to the model used for selecting the coreset, but a smaller network could achieve this selection.Valvano et al. [30] considered an unsupervised coreset selection; their work was based on a convolutional variational autoencoder (CVAE) to develop an embedding space in which the distance between successive features was calculated.Their work was limited to a small amount of data; furthermore, the computation was intractable, as a large amount of data is presented.

Unsupervised Coreset Selection
The objective of unsupervised coreset selection is to identify coreset examples in the absence of labels.Here, we recalled that a coreset provides more information than a non-coreset.In other words, an information-related metric must be established to measure the coreset score.Because contrastive learning is the fundamental idea of our work, the cossim can be readily obtained.Therefore, we were able to prove that cossim is a valid metric to represent the coreset.Our hypothesis is that cossim plays a major role in measuring the coreset score.This hypothesis was made based on visual inspection, which is presented in Section 5.3.First, to confirm our hypothesis, we present our observations.
Observations.During contrastive learning, we calculated the coreset score as described in Alg. 1.The resulting array A corresponds to a list of index examples in the training set.Provided that our score represents the coreset adequately, the performance increases according to the coreset score.To provide a more concrete proof, we built a coreset score based on SimCLR and trained a classification network with cross-entropy loss and a subset of the training set that is described in Eq. 1.The stride s ranges from 0% to 70% of the size of the training set and the subset size L is set to 30% of the size of the training set.
x k , y k represents a pair of k-th examples and targets, and (•) is the cross-entropy loss.For example, in the case of CIFAR10, the sizes of the training and testing set are 50,000 and 10,000, respectively.We chose 30% of the training set (15,000 examples) according to the coreset score and stride.In addition, we set the stride s as follows: s ∈ {0, 5000, 10000, 15000, 20000, 25000, 30000, 35000}.Following this, we trained the classification network with these subset samples and evaluated the performance with the testing set.
Fig. 1 illustrates the performance trend for the three datasets (CIFAR10, SVHN, and QMNIST).A clear correlation can be observed between the test accuracy and stride.
As the stride increases (and the coreset score decreases), we observed that the performance of our coreset tends to degrade almost linearly across the datasets, and at approximately stride=20%, it descends below that of the randomly generated subset.This observation emphasized the fact that the examples that provided more information compared with the others exhibited high coreset scores, and vice versa (the examples that were not too informative had a low coreset score).Additionally, we observed that cossim adequately represents the information-related metric required to measure the score.Hence, it was inferred that our score effectively sorted the examples according to the coreset.On the contrary, our coreset performance would exhibit a uniform trend across strides in a failure case.
Tracking cossim.It is worth noting that averaging cossim (cossim accumulation) rather than using a single cossim is an efficient strategy.After SimCLR learning on CIFAR10 was completed, we selected two examples whose mean cossim were the highest and lowest, respectively, i.e., Here, we tracked their cossim values for each epoch during learning, as shown in Fig. 3, where P1 and P2 correspond to x l , x k , respectively.Unexpectedly, the P2 example nearly maintained its value close to 1 from the beginning of the learning.However, the value of P1 hardly approaches 1 even at the end of learning.In addition, the cossim values of both the examples abruptly changed over the entire learning process.Thus, considering noise-like cossim, averaging rather than choosing a single cossim value at a certain epoch in the middle of learning was a reasonable approach.
Identifying the coreset.As mentioned above, we calculated the coreset score by averaging the cossim values.This allowed us to select the coreset based on the score.Specifically, our coreset score was calculated as follows: throughout the contrastive-learning process, the negative cossim value is accumulated for each example.Following this, we sorted the scores in ascending order.The final coreset was then chosen with the size of the value specified according to the score, as displayed in Alg.1, where [n] = {1, ..., N } and N is the size of training set.This approach, although simple, has proven effective in identifying the coreset, as demonstrated in Section 4.3.It owes to the fact that in contrastive learning, strong augmentation plays a crucial role in realizing representation learning.For instance, considering two cases in which random cropping-a form of augmentation-is applied to two different types of T-shirts (Fig. 4), we can assume that the two randomly cropped regions include dissimilar contents, although they are in an identical class.Consequently, the network experiences difficulties in minimizing the value of cossim for abnormal T-shirts, resulting in lower coreset scores compared with those of normal T-shirts.Thus, we concluded that the abnormal T-shirt possibly provides more information about the concept of the T-shirt than the normal one did.We verify this claim in Section 4.3.Intuitively, the most informative example would include the maximum amount of content possible.The bottom line of this claim is that an example belonging to a coreset would exhibit an abnormal or peculiar appearance.
Assessing the coreset.Any approach to obtain a reliable coreset must meet the following requirements: i) consistency: across random seeds for the selection model, the ranking of the resulting coreset score must be as stable as possible.ii) model-agnostic: once the coreset is chosen, it should outperform a randomly chosen set regardless of the model of the target task.We set up an evaluation protocol and describe the results in Sections 5.1 and 5.2.Furthermore, the absence of labels connotes that the data imbalance among the interclasses is an inherent problem for unsupervised coreset selection.We also examine the class imbalance of our coreset compared to a randomly chosen coreset in Section 5.4.

Experiments
In this section, we prove that coreset selection is a valid approach for performing classification tasks.Our coreset is an improvement over the random performance of three different datasets (CIFAR10, SVHN, and QMNIST).For comparison with supervised coreset selection (SCS), we conducted SCS experiments, i.e., using greedy k-centers [26] and forgetting events [27], which are also referred to in the literature [4].In most cases, the performance of SCS, which is label-hungry by nature, manifestly surpasses that of our method.However, SCS is arguably an invalid approach for cost reduction because it requires fully annotated data.The importance of our study is thus highlighted: our proposed method is designed to directly process unlabeled data although it is outperformed by SCS.In addition, as mentioned in Section 3, along with Fig. 4, we presented another experiment, the coreset cross test, designed to provide evidence that a set of examples chosen by our coreset score offer highly informative content.

Implementation Details
Datasets.To vary the example type to gray-scale, RGB, digits, and objects, we chose three different datasets (CIFAR10 [17], SVHN [22], and QMNIST [34]), which consisted of 10 classes each.CIFAR10 is intended to support object classification and consists of 50,000 training and 10,000 testing examples.Each example has a size of 32×32; the examples include center-cropped objects and RGB channels.QMNIST datasets are generated digits based on NIST Special Database 19 [9] with the purpose of maintaining MNIST [18] preprocessing.QMNIST is divided into a training dataset of 60,000 images and a testing dataset of 10,000 images.These are gray-scale images with a resolution of 28×28.In contrast to QMNIST, SVHN comprises RGB channels.We used 73,257 examples for training and 26,032 for testing.These are cropped images of digits with a size of 32×32.We omitted the extra images from this dataset (531,131 examples).For both SVHN and QMNIST, the annotated labels directly correspond to digits, i.e., label '1' is the digit 1, etc.

Models.
We deployed ResNet18 [13] for the contrastive and classification tasks.In SimCLR learning, our coreset was constructed on the basis of the code available on the website2 .We set the batch size to 1024, number of epochs to 1,000, dimension of projection to 128, and employed the LARS optimizer [35] across the datasets.In MoCo learning, we shared all the hyperparameters, such as the epochs (600) and batch size (512), across the datasets, and used cosine learning-rate scheduling based In most cases, our coreset was more accurate than the random subset.Despite the absence of labels, our results were comparable to those of supervised coreset selection (SCS).Notably, SCS (with forgetting events) was less accurate than other methods in the case of SVHN.For QMNIST, we additionally examined the performance at 20% subset size to verify the saturation point; above 20% subset size, our SimCLR model and SCS had become saturated at top performance, whereas our MoCo model and random subset were still undergoing improvement.on the code available on the website 3 For both types of contrastive learning, augmentation techniques including randomly resized crops, random horizontal flip, color jittering, and random gray-scale were implemented.In particular, because QMNIST consists of gray-scale images, we modified the first layer of ResNet18 to contain only a single channel.Additionally, for both SimCLR and MoCo, we deployed the following augmentation techniques across datasets: randomly resized crop, color jittering, horizontal flip, and applying gray-scale.

Coreset Selection Task
For training SCS with forgetting events and greedy kcenters, we utilized an entirely labeled training set and selected a subset according to their coreset scores.For SVHN, whose training set size is 73257, we set subset sizes (30% ∼ 70%) as follows: {22500, 30000, 37500, 45000, 52500}.The results of our classification task are shown in Fig. 5.We calculated the mean and standard deviation of the test accuracy over five runs.Our coreset proved to exhibit a higher accuracy than the randomly chosen coreset did.This implies that in contrastive learning, cossim serves as a useful metric for selecting the coreset.In most cases, SCS outperforms our coreset, whereas for the SVHN dataset, our results are comparable despite being unlabeled.As QMNIST contained gray-scale images, and the dataset is relatively easy to manipulate compared with the others, our improvement was marginal; however, it outperformed the random coreset at all subset sizes.Across datasets, our performance lay between SCS and the random subset, highlighting the importance of our study; despite having no labels, we were able to increase performance close to that of SCS.

Coreset Cross Test
To verify whether the examples chosen by our coreset score are highly informative, we designed a cross test as follows: once the coreset score based on SimCLR was established, we split the original training set into two subsets; the top 30% ∼ 70% as the training set and the bottom 30% as the testing set (referred to as coreset → non-coreset or C → N).Similarly, we set the bottom 30% ∼ 70% as the train-ing set and the top 30% as the testing set (referred to as the non-coreset → coreset or N → C), and repeated each test for five runs.As shown in Table .1, in all the cases, the C → N cross test was superior to N → C.This result supports our claim that samples with high coreset scores provide more information than those with low scores.Furthermore, their appearances are significantly different (elaborated in Section 5.3).
Table 1.Cross-test results.Average (± std) obtained from the five runs on the three datasets.C and N represent the coreset and noncoreset, respectively.Across the dataset and fraction of the subset, the C → N test achieved significantly higher accuracy than that of the N → C test.Cross tests were conducted using the coreset score based on SimCLR. Fraction

Ablation Study
In this section, we provide a detailed description of the experiment we conducted to describe our coreset.First, we attempted to verify that our coreset produces consistent examples that are agnostic to random seeds.In addition, by varying the model for contrastive learning and target-task learning, we confirmed that our coreset delivers consistent performance agnostic to the model.Second, we visually inspected each example to examine the appearance of an image belonging to the coreset and non-coreset.Finally, because class imbalance inevitably occurs owing to the absence of labels, we plotted the fraction of examples for each class according to the size of the subset.

Consistent Coreset
As mentioned earlier, an effective coreset algorithm can produce consistent coreset elements across random seeds.This property can be easily verified by calculating the intersection ratio between multiple coresets resulting from multiple executions.These results are presented in Table 2, where each intersection ratio is calculated by dividing the number of intersection elements by the size of the subset.Notably, the intersection ratio of our coreset was significantly greater than that of the random subset across

Model-Agnostic Coreset Selection
It is apparent that coreset consistency is beneficial in avoiding rerunning coreset selection when the target of the coreset selection model is altered.Hence, we claim that model-agnostic coreset selection is one of the essential properties that any coreset approach has to provide.For our coreset experiment, all the experimental settings were identical to those described in Section 4, with the target model, i.e., ResNet101, being the only difference.As depicted in Fig. 6, our coreset achieved greater performance than that of the random subset.For extensive proof, we deployed different architectures, such as wide ResNet (WRN) [36], ResNeXt [33], and DenseNet [14], using a single coreset with SimCLR on CIFAR10.As listed in Fig. 3, across subset sizes and target model architectures, our coreset consistently yields high performance compared with the random.subset size : 70% (g) (h) (i) Figure 7. Distribution of the fraction of examples for each class.Our coreset exhibits a class imbalance for all the datasets.However, a performance degradation was not observed for subsets above 30% in size.In addition, for large subsets, the imbalance seems to diminish.

Data Imbalance
Unless annotation is provided, it is difficult to prevent interclass imbalance.To examine the class imbalance of our coreset, we displayed the fraction of resulting examples for each class according to the size of the subset, 30%, 50%, and 70%, as depicted in Fig. 7.A clear imbalance can be observed in our coreset.However, for a subset size above 30%, the imbalance problem did not seem to affect the classification accuracy.Moreover, as the size of the subset increased, the imbalance tended to decrease.Notably, SVHN training data are inherently unbalanced; hence, even a random subset did not have a uniform distribution.Moreover, although our coreset presents a data imbalance, we were able to achieve high accuracy.These facts imply that our coreset search examples contain highly informative content.

Discussion
Extending the contrastive learning-based score for the core-dataset.We claimed that the coreset score adequately represents examples that are more informative.To extend this result, we attempted to answer the question of whether it would be possible to distinguish more informative datasets from the others In other words, our claim encouraged us to explore the possibility of employing the contrastive learning-based score to represent the core-dataset.To validate our claim, we plotted the distribution of average cossim for each dataset, as can be seen in Fig. 9, It is evident that relatively more informative datasets are in the order of CIFAR10, SVHN, and QMNIST because these datasets are composed of RGB + objects, RGB + digits, and gray scale + digits, respectively.Accordingly, the mean and median values for the average cossim values were in the same order.This could imply that the cossim distribution represents the level of difficulty for a dataset; for example, QMNIST is a fairly easy dataset for performing classification compared with CIFAR10 and SVHN; hence, the mean and median of average cossim distribution yielded a large value.We conjecture that our coreset score can possibly be used to iden- tify the core-datasets, and could be the subject of further study in the future.

Conclusion
In this study, we demonstrated that self-supervised learning can create a coreset in the absence of a label.Because cossim is readily available from contrastive learning, we identified a metric that effectively measures the coreset score.Construction of the coreset score based on the average cossim value enabled us to successfully select the coreset from a pool of unlabeled data.This increased the classification accuracy over that of a randomly chosen subset.In addition, our approach yielded results that were comparable to those of methods using supervised learning for coreset selection.Although cossim is appropriate for identifying the coreset per se, the problem of establishing a new metric still remains.Therefore, in future research, we hope to establish a metric better tailored to coresets.Furthermore, when a pool of unlabeled data is gathered from the website and annotation cost should be reduced, our study possibly provides an guidance which example to annotate first without sacrificing performance.

Figure 2 .
Figure 2. Overall flowchart of our coreset selection approach.Example of our coreset selection: During contrastive learning (top left), the cossim value is calculated (top middle), and its negative value is accumulated (top right) simultaneously for each example.N and M represent the size of the training set and the number of epochs, respectively.After the learning is completed, the training data is sorted in increasing order of the coreset score (bottom).Lastly, the coreset with a specified size (e.g., eight) is selected.

Figure 3 .
Figure 3. Cossim values for each epoch.Cossim of two examples (CIFAR10) with the highest and lowest coreset scores based on SimCLR: P1(blue line) and P2(red line).Note that P1 barely approaches 1 for the entire duration of the learning process.

Algorithm 1 :
Identify coreset .initialize metric M[k] = 0, k ∈ [n] input: subset size L # calculate the coreset score while not constrastive learning done do for all k ∈ [n] do M [k] = M [k] − cossim(zi(k), zj(k)) # zi,zj are a positive pair of latent variables for the k-th example A =argsort(M , ascending order) return A[0 : L] # resulting coreset (a) normal t-shirt (b) abnormal t-shirt Figure 4. Example of a normal and abnormal T-shirt.The dashes represent two randomly cropped regions.In normal Tshirts, the contents of the two cropped regions are not significantly different.On the other hand, in the abnormal T-shirt, the content of the two cropped regions is significantly different.

Figure 5 .
Coreset selection performance on three classification datasets.The vertical bars indicate the standard deviation of the accuracy.

Figure 6 .
Model-agnostic performance on three classification datasets.Coreset selection model: ResNet18; target model: ResNet101.The vertical bars indicate the standard deviation of the accuracy.For all three datasets, our coreset was more accurate than the random subset.
gle coreset score with SimCLR, represented in Fig. 8.For simplicity, we referred to the top 11 and bottom 11 as the coreset and non-coreset, respectively.As shown in the figure, the coreset examples exhibit bizarre and peculiar patterns; in contrast, the non-coreset examples primarily comprised simple structures and backgrounds.Furthermore, with respect to class imbalance, the coreset of CIFAR10 predominantly consisted of images of airplanes and birds.Similarly, for both SVHN and QMNIST, most of the top 11 examples consisted of images with a label of 1.It should be noted that our coreset may have been adversely affected by class imbalance.This matter is discussed in the subsequent section.

Figure 8 .
Figure 8. Visual inspection for CIFAR10, SVHN, and QMNIST.Core-set and non-coreset examples for CIFAR10 (top two rows), SVHN (middle two rows) and QMNIST (bottom two rows).Examples belonging to the coreset display a less simple structure than that of the non-coreset.In addition, the majority of non-coreset examples are concentrated in a few classes that could lead to class imbalance.

Table 2 .
intersection ratio of five seed runs.For all datasets, our coreset can be observed to produce consistent examples regardless of the random seed.

Table 3 .
Model-agnostic coreset selection task on CIFAR10.In all the cases, our coreset exhibited high accuracy compared with that of the random subset, regardless of the target model architecture.WRN-n-k ,DenseNet-n, and ResNeXt-n denote WRN with n convolution layers and k widening factor, DenseNet with n bottleneck layers, and ResNeXt with n layers, respectively.