Conferences >2021 International Joint Conf...

Soft-Label Dataset Distillation and Text Dataset Distillation

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Dataset distillation is a method for reducing dataset sizes by learning a small number of representative synthetic samples. This has several benefits such as speeding up ...Show More

Metadata

Abstract:

Dataset distillation is a method for reducing dataset sizes by learning a small number of representative synthetic samples. This has several benefits such as speeding up model training, reducing energy consumption, and reducing required storage space. These benefits are especially crucial in settings like federated learning where initial overhead costs are justified by the speedup they enable. Currently, 1) each synthetic sample is assigned a single ‘hard’ label, and 2) dataset distillation can only be used with image data. We propose to simultaneously distill both images and their labels, thus assigning each synthetic sample a ‘soft’ label (a distribution of labels). Our algorithm increases accuracy by 2-4% for several image classification tasks. Using ‘soft’ labels also enables distilled datasets to consist of fewer samples than there are classes as each sample encodes information for multiple classes. For example, training a LeNet model with 10 distilled images (one per class) results in over 96% accuracy on MNIST, and almost 92% accuracy when trained on just 5 distilled images. We also extend the dataset distillation algorithm to distill text data. We demonstrate that text distillation outperforms other methods across multiple datasets. For example, models attain almost their original accuracy on the IMDB sentiment analysis task using just 20 distilled sentences. Our code can be found at https://github.com/ilia10000/dataset-distillation.

Published in: 2021 International Joint Conference on Neural Networks (IJCNN)

Date of Conference: 18-22 July 2021

Date Added to IEEE Xplore: 20 September 2021

ISBN Information:

ISSN Information:

DOI: 10.1109/IJCNN52387.2021.9533769

Conference Location: Shenzhen, China

Contents

References is not available for this document.

Soft-Label Dataset Distillation and Text Dataset Distillation

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Soft-Label Dataset Distillation and Text Dataset Distillation

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?