Abstract:
As a challenging task of high-level video understanding, Weakly-supervised Temporal Action Localization (WTAL) has attracted increasing attention in recent years. However...Show MoreMetadata
Abstract:
As a challenging task of high-level video understanding, Weakly-supervised Temporal Action Localization (WTAL) has attracted increasing attention in recent years. However, due to the weak supervisions of whole-video classification labels, it is challenging to accurately determine action instance boundaries. To address this issue, pseudo-label-based methods [Alwassel et al. (2019), Luo et al. (2020), and Zhai et al. (2020)] were proposed to generate snippet-level pseudo labels from classification results. In spite of the promising performance, these methods hardly take full advantages of multiple modalities, i.e., RGB and optical flow sequences, to generate high quality pseudo labels. Most of them ignored how to mitigate the label noise, which hinders the capability of the network on learning discriminative feature representations. To address these challenges, we propose a Multi-Modality Self-Distillation (MMSD) framework, which contains two single-modal streams and a fused-modal stream to perform multi-modality knowledge distillation and multi-modality self-voting. On the one hand, multi-modality knowledge distillation improves snippet-level classification performance by transferring knowledge between single-modal streams and a fused-modal stream. On the other hand, multi-modality self-voting mitigates the label noise in a modality voting manner according to the reliability and complementarity of the streams. Experimental results on THUMOS14 and ActivityNet1.3 datasets demonstrate the effectiveness of our method and superior performance over state-of-the-art approaches. Our code is available at https://github.com/LeonHLJ/MMSD.
Published in: IEEE Transactions on Image Processing ( Volume: 31)
Funding Agency:

Centre for Perceptual and Interactive Intelligence (CPII), Hong Kong
Multimedia Laboratory, The Chinese University of Hong Kong, Hong Kong
Linjiang Huang received the B.Eng. and M.Eng. degrees from the Huazhong University of Science and Technology in 2014 and 2017, respectively, and the Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences, in 2020. He is currently a Postdoctoral Fellow with the Centre for Perceptual and Interactive Intelligence and Multimedia Laboratory, The Chinese University of Hong Kong. His main research interests i...Show More
Linjiang Huang received the B.Eng. and M.Eng. degrees from the Huazhong University of Science and Technology in 2014 and 2017, respectively, and the Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences, in 2020. He is currently a Postdoctoral Fellow with the Centre for Perceptual and Interactive Intelligence and Multimedia Laboratory, The Chinese University of Hong Kong. His main research interests i...View more

Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing, China
Liang Wang (Fellow, IEEE) received the B.Eng. and M.Eng. degrees from Anhui University in 1997 and 2000, respectively, and the Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences (CASIA), in 2004. He is currently a Full Professor with the Hundred Talents Program, National Laboratory of Pattern Recognition, CASIA. His major research interests include machine learning, pattern recognition, and compute...Show More
Liang Wang (Fellow, IEEE) received the B.Eng. and M.Eng. degrees from Anhui University in 1997 and 2000, respectively, and the Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences (CASIA), in 2004. He is currently a Full Professor with the Hundred Talents Program, National Laboratory of Pattern Recognition, CASIA. His major research interests include machine learning, pattern recognition, and compute...View more

Centre for Perceptual and Interactive Intelligence (CPII), Hong Kong
Multimedia Laboratory, The Chinese University of Hong Kong, Hong Kong
Hongsheng Li (Member, IEEE) received the B.S. degree in automation from the East China University of Science and Technology in 2006, and the M.S. and Ph.D. degrees in computer science from Lehigh University in 2010 and 2012, respectively. He is currently an Assistant Professor with the Department of Electronic Engineering, The Chinese University of Hong Kong. His research interests include computer vision, medical image a...Show More
Hongsheng Li (Member, IEEE) received the B.S. degree in automation from the East China University of Science and Technology in 2006, and the M.S. and Ph.D. degrees in computer science from Lehigh University in 2010 and 2012, respectively. He is currently an Assistant Professor with the Department of Electronic Engineering, The Chinese University of Hong Kong. His research interests include computer vision, medical image a...View more

Centre for Perceptual and Interactive Intelligence (CPII), Hong Kong
Multimedia Laboratory, The Chinese University of Hong Kong, Hong Kong
Linjiang Huang received the B.Eng. and M.Eng. degrees from the Huazhong University of Science and Technology in 2014 and 2017, respectively, and the Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences, in 2020. He is currently a Postdoctoral Fellow with the Centre for Perceptual and Interactive Intelligence and Multimedia Laboratory, The Chinese University of Hong Kong. His main research interests include action recognition, action detection, machine learning, and computer vision.
Linjiang Huang received the B.Eng. and M.Eng. degrees from the Huazhong University of Science and Technology in 2014 and 2017, respectively, and the Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences, in 2020. He is currently a Postdoctoral Fellow with the Centre for Perceptual and Interactive Intelligence and Multimedia Laboratory, The Chinese University of Hong Kong. His main research interests include action recognition, action detection, machine learning, and computer vision.View more

Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing, China
Liang Wang (Fellow, IEEE) received the B.Eng. and M.Eng. degrees from Anhui University in 1997 and 2000, respectively, and the Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences (CASIA), in 2004. He is currently a Full Professor with the Hundred Talents Program, National Laboratory of Pattern Recognition, CASIA. His major research interests include machine learning, pattern recognition, and computer vision. He is an IAPR Fellow.
Liang Wang (Fellow, IEEE) received the B.Eng. and M.Eng. degrees from Anhui University in 1997 and 2000, respectively, and the Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences (CASIA), in 2004. He is currently a Full Professor with the Hundred Talents Program, National Laboratory of Pattern Recognition, CASIA. His major research interests include machine learning, pattern recognition, and computer vision. He is an IAPR Fellow.View more

Centre for Perceptual and Interactive Intelligence (CPII), Hong Kong
Multimedia Laboratory, The Chinese University of Hong Kong, Hong Kong
Hongsheng Li (Member, IEEE) received the B.S. degree in automation from the East China University of Science and Technology in 2006, and the M.S. and Ph.D. degrees in computer science from Lehigh University in 2010 and 2012, respectively. He is currently an Assistant Professor with the Department of Electronic Engineering, The Chinese University of Hong Kong. His research interests include computer vision, medical image analysis, and machine learning.
Hongsheng Li (Member, IEEE) received the B.S. degree in automation from the East China University of Science and Technology in 2006, and the M.S. and Ph.D. degrees in computer science from Lehigh University in 2010 and 2012, respectively. He is currently an Assistant Professor with the Department of Electronic Engineering, The Chinese University of Hong Kong. His research interests include computer vision, medical image analysis, and machine learning.View more