Class Discriminative Knowledge Distillation | IEEE Journals & Magazine | IEEE Xplore

Abstract:

Knowledge distillation aims to transfer knowledge from a large teacher model to a lightweight student model, enabling the student to achieve performance comparable to the...Show More

Abstract:

Knowledge distillation aims to transfer knowledge from a large teacher model to a lightweight student model, enabling the student to achieve performance comparable to the teacher. Existing methods explore various strategies for distillation, including soft logits, intermediate features, and even class-aware logits. Class-aware distillation, in particular, treats the columns of logit matrices as class representations, capturing potential relationships among instances within a batch. However, we argue that representing class embeddings solely as column vectors may not fully capture their inherent properties. In this study, we revisit class-aware knowledge distillation and propose that effective transfer of class-level knowledge requires two regularization strategies: separability and orthogonality. Additionally, we introduce an asymmetric architecture design to further enhance the transfer of class-level knowledge. Together, these components form a new methodology, Class Discriminative Knowledge Distillation (CD-KD). Empirical results demonstrate that CD-KD significantly outperforms several state-of-the-art logit-based and feature-based methods across diverse visual classification tasks, highlighting its effectiveness and robustness.
Page(s): 1340 - 1351
Date of Publication: 29 January 2025
Electronic ISSN: 2471-285X

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.