Conferences >ICASSP 2023 - 2023 IEEE Inter...

Hint-Dynamic Knowledge Distillation

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Knowledge Distillation (KD) transfers the knowledge from a high-capacity teacher model to promote a smaller student model. Existing efforts guide the distillation by matc...Show More

Metadata

Abstract:

Knowledge Distillation (KD) transfers the knowledge from a high-capacity teacher model to promote a smaller student model. Existing efforts guide the distillation by matching their prediction logits, feature embedding, etc., while leaving how to efficiently utilize them in junction less explored. In this paper, we propose Hint-dynamic Knowledge Distillation, dubbed HKD, which excavates the knowledge from the teacher’s hints in a dynamic scheme. The guidance effect from the knowledge hints usually varies in different instances and learning stages, which motivates us to customize a specific hint-learning manner for each instance adaptively. Specifically, a meta-weight network is introduced to generate the instance-wise weight coefficients about knowledge hints in the perception of the dynamical learning progress of the student model. We further present a weight ensembling strategy to eliminate the potential bias of coefficient estimation by exploiting the historical statics. Experiments on standard benchmarks of CIFAR-100 and Tiny-ImageNet manifest that the proposed HKD well boost the effect of knowledge distillation tasks.

Published in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 04-10 June 2023

Date Added to IEEE Xplore: 05 May 2023

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP49357.2023.10096160

Conference Location: Rhodes Island, Greece

Funding Agency:

Contents

1. INTRODUCTION

Whilst deep neural networks (DNNs) have achieved remarkable success in computer vision, most of these well-performed models are difficult to deploy on edge devices in practical scenarios due to the high computational costs. To alleviate this, light-weight DNNs have been investigated a lot. The typical approaches mainly include parameter quantization [1], network pruning [2], knowledge distillation (KD) [3], etc. Among them, the KD topic has gained increasing popularity in various vision tasks due to its simplicity to be integrated into other model compression pipelines.

References is not available for this document.

Hint-Dynamic Knowledge Distillation

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. INTRODUCTION

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Hint-Dynamic Knowledge Distillation

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. INTRODUCTION

Authors

Figures

References

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?