Abstract:
Egocentric early action prediction, which aims to recognize the on-going action in the video captured in the first-person view as early as possible before the action is f...Show MoreMetadata
Abstract:
Egocentric early action prediction, which aims to recognize the on-going action in the video captured in the first-person view as early as possible before the action is fully executed, is a new yet challenging task due to the limited partial video input. Pioneer studies focused on solving this task with LSTMs as the backbone and simply compiling the observed video segment and unobserved video segment into a single vector, which hence suffer from two key limitations: lack the non-sequential relation modeling with the video snippet sequence and the correlation modeling between the observed and unobserved video segment. To address these two limitations, in this paper, we propose a novel multimodal TransfoRmer-based duAl aCtion prEdiction (mTRACE) model for the task of egocentric early action prediction, which consists of two key modules: the early (observed) segment action prediction module and the future (unobserved) segment action prediction module. Both modules take Transformer encoders as the backbone for encoding all the potential relations among the input video snippets, and involve several single-modal and multi-modal classifiers for comprehensive supervision. Different from previous work, each of the two modules outputs two multi-modal feature vectors: one for encoding the current input video segment, and the other one for predicting the missing video segment. For optimization, we design a two-stage training scheme, including the mutual enhancement stage and end-to-end aggregation stage. The former stage alternatively optimizes the two action prediction modules, where the correlation between the observed and unobserved video segment is modeled with a consistency regularizer, while the latter seamlessly aggregates the two modules to fully utilize the capacity of the two modules. Extensive experiments have demonstrated the superiority of our proposed model. We have released the codes and the corresponding parameters to benefit other researchers at https://trace72...
Published in: IEEE Transactions on Circuits and Systems for Video Technology ( Volume: 33, Issue: 9, September 2023)
Funding Agency:

Department of Data Science and Artificial Intelligence, Monash University, Clayton, VIC, Australia
Peng Cheng Laboratory, AI Research Center, Shenzhen, China
Weili Guan (Member, IEEE) received the master’s degree from the National University of Singapore. She is currently pursuing the Ph.D. degree with the Faculty of Information Technology, Monash University (Clayton Campus), Australia. She is also an Intern with the Peng Cheng Laboratory. After her master’s degree, she joined Hewlett Packard Enterprise, Singapore, as a Software Engineer, where she has worked for around five y...Show More
Weili Guan (Member, IEEE) received the master’s degree from the National University of Singapore. She is currently pursuing the Ph.D. degree with the Faculty of Information Technology, Monash University (Clayton Campus), Australia. She is also an Intern with the Peng Cheng Laboratory. After her master’s degree, she joined Hewlett Packard Enterprise, Singapore, as a Software Engineer, where she has worked for around five y...View more

School of Computer Science and Technology, Shandong University, Tsingtao, China
Xuemeng Song (Senior Member, IEEE) received the B.E. degree from the University of Science and Technology of China, in 2012, and the Ph.D. degree from the School of Computing, National University of Singapore, in 2016. She is currently an Associate Professor with Shandong University, China. She has published several papers in the top venues, such as ACM SIGIR, ACM MM, and ACM TOIS. Her research interests include informati...Show More
Xuemeng Song (Senior Member, IEEE) received the B.E. degree from the University of Science and Technology of China, in 2012, and the Ph.D. degree from the School of Computing, National University of Singapore, in 2016. She is currently an Associate Professor with Shandong University, China. She has published several papers in the top venues, such as ACM SIGIR, ACM MM, and ACM TOIS. Her research interests include informati...View more

School of Computer Science and Technology, Shandong University, Tsingtao, China
Kejie Wang is currently pursuing the B.Eng. degree in computer science with Shandong University. His research interests include multimedia computing.
Kejie Wang is currently pursuing the B.Eng. degree in computer science with Shandong University. His research interests include multimedia computing.View more

School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
Haokun Wen received the B.E. degree from the Ocean University of China, in 2019, and the master’s degree from the School of Computer Science and Technology, Shandong University, in 2022. He is currently pursuing the Ph.D. degree with the Department of Computer Science and Technology, Harbin Institute of Technology (Shenzhen). He has published several papers in top venues, such as ACM SIGIR, ACM MM, IEEE Transactions on Im...Show More
Haokun Wen received the B.E. degree from the Ocean University of China, in 2019, and the master’s degree from the School of Computer Science and Technology, Shandong University, in 2022. He is currently pursuing the Ph.D. degree with the Department of Computer Science and Technology, Harbin Institute of Technology (Shenzhen). He has published several papers in top venues, such as ACM SIGIR, ACM MM, IEEE Transactions on Im...View more

School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
Hongda Ni is currently pursuing the B.Eng. degree in computer science with the Harbin Institute of Technology (Shenzhen). His research interests include multimedia computing and natural language processing.
Hongda Ni is currently pursuing the B.Eng. degree in computer science with the Harbin Institute of Technology (Shenzhen). His research interests include multimedia computing and natural language processing.View more

Peng Cheng Laboratory, AI Research Center, Shenzhen, China
Yaowei Wang (Member, IEEE) received the Ph.D. degree in computer science from the University of Chinese Academy of Sciences in 2005. He has worked with the Department of Electronics Engineering, Beijing Institute of Technology, from 2005 to 2019. Currently, he is a Professor with the Peng Cheng Laboratory, Shenzhen, China. He has coauthored more than 120 technical articles in international journals and conferences, includ...Show More
Yaowei Wang (Member, IEEE) received the Ph.D. degree in computer science from the University of Chinese Academy of Sciences in 2005. He has worked with the Department of Electronics Engineering, Beijing Institute of Technology, from 2005 to 2019. Currently, he is a Professor with the Peng Cheng Laboratory, Shenzhen, China. He has coauthored more than 120 technical articles in international journals and conferences, includ...View more

Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia
Xiaojun Chang (Senior Member, IEEE) is a Professor with the Faculty of Engineering and Information Technology, Australian Artificial Intelligence Institute, University of Technology Sydney (UTS). He is also the Director of the ReLER Laboratory. He is also an Honorary Professor with the School of Computing Technologies, RMIT University, Australia, where he was an Associate Professor with the School of Computing Technologie...Show More
Xiaojun Chang (Senior Member, IEEE) is a Professor with the Faculty of Engineering and Information Technology, Australian Artificial Intelligence Institute, University of Technology Sydney (UTS). He is also the Director of the ReLER Laboratory. He is also an Honorary Professor with the School of Computing Technologies, RMIT University, Australia, where he was an Associate Professor with the School of Computing Technologie...View more

Department of Data Science and Artificial Intelligence, Monash University, Clayton, VIC, Australia
Peng Cheng Laboratory, AI Research Center, Shenzhen, China
Weili Guan (Member, IEEE) received the master’s degree from the National University of Singapore. She is currently pursuing the Ph.D. degree with the Faculty of Information Technology, Monash University (Clayton Campus), Australia. She is also an Intern with the Peng Cheng Laboratory. After her master’s degree, she joined Hewlett Packard Enterprise, Singapore, as a Software Engineer, where she has worked for around five years. She has published many papers at the first-tier conferences and journals, such as ACM MM, SIGIR, and IEEE Transactions on Image Processing. Her research interests are multimedia computing and information retrieval.
Weili Guan (Member, IEEE) received the master’s degree from the National University of Singapore. She is currently pursuing the Ph.D. degree with the Faculty of Information Technology, Monash University (Clayton Campus), Australia. She is also an Intern with the Peng Cheng Laboratory. After her master’s degree, she joined Hewlett Packard Enterprise, Singapore, as a Software Engineer, where she has worked for around five years. She has published many papers at the first-tier conferences and journals, such as ACM MM, SIGIR, and IEEE Transactions on Image Processing. Her research interests are multimedia computing and information retrieval.View more

School of Computer Science and Technology, Shandong University, Tsingtao, China
Xuemeng Song (Senior Member, IEEE) received the B.E. degree from the University of Science and Technology of China, in 2012, and the Ph.D. degree from the School of Computing, National University of Singapore, in 2016. She is currently an Associate Professor with Shandong University, China. She has published several papers in the top venues, such as ACM SIGIR, ACM MM, and ACM TOIS. Her research interests include information retrieval and social network analysis. She has served as a reviewer for many top conferences and journals. She is also an AE of IET Image Processing.
Xuemeng Song (Senior Member, IEEE) received the B.E. degree from the University of Science and Technology of China, in 2012, and the Ph.D. degree from the School of Computing, National University of Singapore, in 2016. She is currently an Associate Professor with Shandong University, China. She has published several papers in the top venues, such as ACM SIGIR, ACM MM, and ACM TOIS. Her research interests include information retrieval and social network analysis. She has served as a reviewer for many top conferences and journals. She is also an AE of IET Image Processing.View more

School of Computer Science and Technology, Shandong University, Tsingtao, China
Kejie Wang is currently pursuing the B.Eng. degree in computer science with Shandong University. His research interests include multimedia computing.
Kejie Wang is currently pursuing the B.Eng. degree in computer science with Shandong University. His research interests include multimedia computing.View more

School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
Haokun Wen received the B.E. degree from the Ocean University of China, in 2019, and the master’s degree from the School of Computer Science and Technology, Shandong University, in 2022. He is currently pursuing the Ph.D. degree with the Department of Computer Science and Technology, Harbin Institute of Technology (Shenzhen). He has published several papers in top venues, such as ACM SIGIR, ACM MM, IEEE Transactions on Image Processing, and ACM TOMM. His research interests include multimedia computing and information retrieval.
Haokun Wen received the B.E. degree from the Ocean University of China, in 2019, and the master’s degree from the School of Computer Science and Technology, Shandong University, in 2022. He is currently pursuing the Ph.D. degree with the Department of Computer Science and Technology, Harbin Institute of Technology (Shenzhen). He has published several papers in top venues, such as ACM SIGIR, ACM MM, IEEE Transactions on Image Processing, and ACM TOMM. His research interests include multimedia computing and information retrieval.View more

School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
Hongda Ni is currently pursuing the B.Eng. degree in computer science with the Harbin Institute of Technology (Shenzhen). His research interests include multimedia computing and natural language processing.
Hongda Ni is currently pursuing the B.Eng. degree in computer science with the Harbin Institute of Technology (Shenzhen). His research interests include multimedia computing and natural language processing.View more

Peng Cheng Laboratory, AI Research Center, Shenzhen, China
Yaowei Wang (Member, IEEE) received the Ph.D. degree in computer science from the University of Chinese Academy of Sciences in 2005. He has worked with the Department of Electronics Engineering, Beijing Institute of Technology, from 2005 to 2019. Currently, he is a Professor with the Peng Cheng Laboratory, Shenzhen, China. He has coauthored more than 120 technical articles in international journals and conferences, including IEEE Transactions on Image Processing, CVPR, and ICCV. His research interests include machine learning, and multimedia content analysis and understanding. He serves as a member for CIE, CCF, and CSIG. He was a recipient of the second prize of the National Technology Invention in 2017 and the first prize of the CIE Technology Invention in 2015. He serves as the Chair for the IEEE Digital Retina Systems Working Group. He has promoted the digital retina technology, made efforts to establish system standards for Digital Retina. He has trained a vision model named “Pengcheng Dasheng” with one billion parameters, achieving an over 10% performance gain in the detection and recognition task in more than 20 application scenarios. He led the development of the first digital retina verification systems, which has been applied to the urban traffic management field of over 30 large and medium-sized cities in China.
Yaowei Wang (Member, IEEE) received the Ph.D. degree in computer science from the University of Chinese Academy of Sciences in 2005. He has worked with the Department of Electronics Engineering, Beijing Institute of Technology, from 2005 to 2019. Currently, he is a Professor with the Peng Cheng Laboratory, Shenzhen, China. He has coauthored more than 120 technical articles in international journals and conferences, including IEEE Transactions on Image Processing, CVPR, and ICCV. His research interests include machine learning, and multimedia content analysis and understanding. He serves as a member for CIE, CCF, and CSIG. He was a recipient of the second prize of the National Technology Invention in 2017 and the first prize of the CIE Technology Invention in 2015. He serves as the Chair for the IEEE Digital Retina Systems Working Group. He has promoted the digital retina technology, made efforts to establish system standards for Digital Retina. He has trained a vision model named “Pengcheng Dasheng” with one billion parameters, achieving an over 10% performance gain in the detection and recognition task in more than 20 application scenarios. He led the development of the first digital retina verification systems, which has been applied to the urban traffic management field of over 30 large and medium-sized cities in China.View more

Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia
Xiaojun Chang (Senior Member, IEEE) is a Professor with the Faculty of Engineering and Information Technology, Australian Artificial Intelligence Institute, University of Technology Sydney (UTS). He is also the Director of the ReLER Laboratory. He is also an Honorary Professor with the School of Computing Technologies, RMIT University, Australia, where he was an Associate Professor with the School of Computing Technologies, before joining UTS. After graduation, he subsequently worked as a Post-Doctoral Research Fellow with the School of Computer Science, Carnegie Mellon University and a Lecturer and a Senior Lecturer with the Faculty of Information Technology, Monash University, Australia. He has focused his research on exploring multiple signals (visual, acoustic, and textual) for automatic content analysis in unconstrained or surveillance videos. His team has won multiple prizes from international grand challenges, which hosted competitive teams from MIT, University of Maryland, Facebook AI Research (FAIR), and Baidu VIS; and aim to advance visual understanding using deep learning. For example, he won the first place in the TrecVID 2019–Activity Extended Video (ActEV) Challenge, which was held by the National Institute of Standards and Technology, USA.
Xiaojun Chang (Senior Member, IEEE) is a Professor with the Faculty of Engineering and Information Technology, Australian Artificial Intelligence Institute, University of Technology Sydney (UTS). He is also the Director of the ReLER Laboratory. He is also an Honorary Professor with the School of Computing Technologies, RMIT University, Australia, where he was an Associate Professor with the School of Computing Technologies, before joining UTS. After graduation, he subsequently worked as a Post-Doctoral Research Fellow with the School of Computer Science, Carnegie Mellon University and a Lecturer and a Senior Lecturer with the Faculty of Information Technology, Monash University, Australia. He has focused his research on exploring multiple signals (visual, acoustic, and textual) for automatic content analysis in unconstrained or surveillance videos. His team has won multiple prizes from international grand challenges, which hosted competitive teams from MIT, University of Maryland, Facebook AI Research (FAIR), and Baidu VIS; and aim to advance visual understanding using deep learning. For example, he won the first place in the TrecVID 2019–Activity Extended Video (ActEV) Challenge, which was held by the National Institute of Standards and Technology, USA.View more