Abstract:
With the increasing popularity of deep convolutional neural networks (DCNNs), in addition to achieving high accuracy, it becomes increasingly important to explain how DCN...Show MoreMetadata
Abstract:
With the increasing popularity of deep convolutional neural networks (DCNNs), in addition to achieving high accuracy, it becomes increasingly important to explain how DCNNs make their decisions. In this article, we propose a CHannel-wise disentangled InterPretation (CHIP) model for visual interpretations of DCNN predictions. The proposed model distills the class-discriminative importance of channels in DCNN by utilizing sparse regularization. We first introduce network perturbation to learn the CHIP model. The proposed model is capable to not only distill the global perspective knowledge from networks but also present class-discriminative visual interpretations for the predictions of networks. It is noteworthy that the CHIP model is able to interpret different layers of networks without retraining. By combining the distilled interpretation knowledge at different layers, we further propose the Refined CHIP visual interpretation that is both high-resolution and class-discriminative. Based on qualitative and quantitative experiments on different data sets and networks, the proposed model provides promising visual interpretations for network predictions in an image classification task compared with the existing visual interpretation methods. The proposed model also outperforms the related approaches in the ILSVRC 2015 weakly supervised localization task.
Published in: IEEE Transactions on Neural Networks and Learning Systems ( Volume: 31, Issue: 10, October 2020)