Loading web-font TeX/Math/Italic
Improving Knowledge Distillation With a Customized Teacher | IEEE Journals & Magazine | IEEE Xplore

Improving Knowledge Distillation With a Customized Teacher


Abstract:

Knowledge distillation (KD) is a widely used approach to transfer knowledge from a cumbersome network (also known as a teacher) to a lightweight network (also known as a ...Show More

Abstract:

Knowledge distillation (KD) is a widely used approach to transfer knowledge from a cumbersome network (also known as a teacher) to a lightweight network (also known as a student). However, even though the accuracies of different teachers are similar, the fixed student’s accuracies are significantly different. We find that teachers with more dispersed secondary soft probabilities are more qualified to play their roles. Therefore, an indicator, i.e., the standard deviation \sigma of secondary soft probabilities, is introduced to choose the teacher. Moreover, to make a teacher’s secondary soft probabilities more dispersed, a novel method, dubbed pretraining the teacher under dual supervision (PTDS), is proposed to pretrain a teacher under dual supervision. In addition, we put forward an asymmetrical transformation function (ATF) to further enhance the dispersion degree of the pretrained teachers’ secondary soft probabilities. The combination of PTDS and ATF is termed knowledge distillation with a customized teacher (KDCT). Extensive empirical experiments and analyses are conducted on three computer vision tasks, including image classification, transfer learning, and semantic segmentation, to substantiate the effectiveness of KDCT.
Published in: IEEE Transactions on Neural Networks and Learning Systems ( Volume: 35, Issue: 2, February 2024)
Page(s): 2290 - 2299
Date of Publication: 25 July 2022

ISSN Information:

PubMed ID: 35877790

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.