Abstract:
Coreset selection is a technique that reduces model training overhead while retaining high accuracy by selecting a representative subset of the training data. The quality...Show MoreMetadata
Abstract:
Coreset selection is a technique that reduces model training overhead while retaining high accuracy by selecting a representative subset of the training data. The quality of the selected coresets can be assessed by quantifying the coverage of r-radius balls centered at each element to the entire dataset. However, existing methods are limited by primarily optimizing the largest radius or resorting to surrogate approaches derived from importance scores. In this paper, our exploration underscores a new task of optimizing all the coverage radii of coreset elements for the entropy-based method to ensure an effective representation of the underlying data distribution. To this end, we propose the SubPIE algorithm. SubPIE first identifies subpatterns of neighboring samples in the feature space using discrete coordinate descent and then selects a representative sample within each restricted subpattern. Extensive experimental results show that SubPIE can improve the generalization performance of coreset selection compared to 14 baseline methods. More experiments demonstrate the robustness of SubPIE.
Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information: