Abstract:
Collective activity recognition, which tells what activity a group of people is performing, is a cutting-edge research topic in computer vision. Different from action per...Show MoreMetadata
Abstract:
Collective activity recognition, which tells what activity a group of people is performing, is a cutting-edge research topic in computer vision. Different from action performed by individuals, collective activity needs to consider the complex interactions among different people. However, most previous works require exhaustive annotations such as accurate label information of individual actions, pairwise interactions, and poses, which could not be easily available in practice. Moreover, most of them treat human detection as a decoupled task before collective activity recognition and leverage all detected persons. This not only ignores the mutual relation between the two tasks, which makes it hard for filtering out irrelevant people, but also probably increases the computation burden when reasoning the collective activities. In this paper, we propose a fast weakly supervised deep learning architecture for collective activity recognition. For fast inference, we propose to make the actor detection and weakly supervised collective activity reasoning collaborate in an end-to-end framework by sharing convolutional layers between them. The joint learning makes the two tasks united and reinforced each other, so that it is more effective to filter out the outliers who are not involved in the activity. For the weakly supervised learning, we propose a latent embedding scheme for mining person-group interactive relationship to get rid of the use of any pairwise relation between people and the individual action labels as well. The experimental results show that the proposed framework achieves comparable or even better performance as compared to the state-of-the-art on three datasets. Our joint modelling reasons collective activities at the speed of 22.65 fps, which is the fastest ever known and substantially makes collective activity recognition more towards real-time applications.
Published in: IEEE Transactions on Image Processing ( Volume: 29)
Funding Agency:

School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
Peizhen Zhang received the B.S. and M.S. degrees from the School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China, in 2016 and 2019, respectively. His research interests include object detection, collective activity recognition, and autoML.
Peizhen Zhang received the B.S. and M.S. degrees from the School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China, in 2016 and 2019, respectively. His research interests include object detection, collective activity recognition, and autoML.View more

Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University), Ministry of Education, China
Yongyi Tang received the B.S. degree from the South China University of Technology in 2015 and the M.S. degree from Sun Yat-Sen University, Guangzhou, China, in 2018. His research interests include collective activity recognition and video analysis.
Yongyi Tang received the B.S. degree from the South China University of Technology in 2015 and the M.S. degree from Sun Yat-Sen University, Guangzhou, China, in 2018. His research interests include collective activity recognition and video analysis.View more

School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
Jian-Fang Hu received the B.S. and Ph.D. degrees from the School of Mathematics, Sun Yat-sen University, Guangzhou, China, in 2016 and 2010, respectively. He has published several scientific papers in international conferences and journals, including ICCV, CVPR, ECCV, IEEE TPAMI, IEEE TCSVT, and PR. His research interests include human-object interaction modeling, 3D face modeling, and RGB-D action recognition.
Jian-Fang Hu received the B.S. and Ph.D. degrees from the School of Mathematics, Sun Yat-sen University, Guangzhou, China, in 2016 and 2010, respectively. He has published several scientific papers in international conferences and journals, including ICCV, CVPR, ECCV, IEEE TPAMI, IEEE TCSVT, and PR. His research interests include human-object interaction modeling, 3D face modeling, and RGB-D action recognition.View more

School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
Wei-Shi Zheng received the Ph.D. degree in applied mathematics from Sun Yat-sen University in 2008. He is currently a Professor with Sun Yat-sen University. He has now published more than 110 papers, including more than 90 publications in main journals (TPAMI, TNN/TNNLS, TIP, and TSMC-B, PR) and top conferences (ICCV, CVPR, IJCAI, and AAAI). His research interests include person/object association and activity understandi...Show More
Wei-Shi Zheng received the Ph.D. degree in applied mathematics from Sun Yat-sen University in 2008. He is currently a Professor with Sun Yat-sen University. He has now published more than 110 papers, including more than 90 publications in main journals (TPAMI, TNN/TNNLS, TIP, and TSMC-B, PR) and top conferences (ICCV, CVPR, IJCAI, and AAAI). His research interests include person/object association and activity understandi...View more

School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
Peizhen Zhang received the B.S. and M.S. degrees from the School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China, in 2016 and 2019, respectively. His research interests include object detection, collective activity recognition, and autoML.
Peizhen Zhang received the B.S. and M.S. degrees from the School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China, in 2016 and 2019, respectively. His research interests include object detection, collective activity recognition, and autoML.View more

Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University), Ministry of Education, China
Yongyi Tang received the B.S. degree from the South China University of Technology in 2015 and the M.S. degree from Sun Yat-Sen University, Guangzhou, China, in 2018. His research interests include collective activity recognition and video analysis.
Yongyi Tang received the B.S. degree from the South China University of Technology in 2015 and the M.S. degree from Sun Yat-Sen University, Guangzhou, China, in 2018. His research interests include collective activity recognition and video analysis.View more

School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
Jian-Fang Hu received the B.S. and Ph.D. degrees from the School of Mathematics, Sun Yat-sen University, Guangzhou, China, in 2016 and 2010, respectively. He has published several scientific papers in international conferences and journals, including ICCV, CVPR, ECCV, IEEE TPAMI, IEEE TCSVT, and PR. His research interests include human-object interaction modeling, 3D face modeling, and RGB-D action recognition.
Jian-Fang Hu received the B.S. and Ph.D. degrees from the School of Mathematics, Sun Yat-sen University, Guangzhou, China, in 2016 and 2010, respectively. He has published several scientific papers in international conferences and journals, including ICCV, CVPR, ECCV, IEEE TPAMI, IEEE TCSVT, and PR. His research interests include human-object interaction modeling, 3D face modeling, and RGB-D action recognition.View more

School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
Wei-Shi Zheng received the Ph.D. degree in applied mathematics from Sun Yat-sen University in 2008. He is currently a Professor with Sun Yat-sen University. He has now published more than 110 papers, including more than 90 publications in main journals (TPAMI, TNN/TNNLS, TIP, and TSMC-B, PR) and top conferences (ICCV, CVPR, IJCAI, and AAAI). His research interests include person/object association and activity understanding in visual surveillance, and the related large-scale machine learning algorithm. He has joined the Microsoft Research Asia Young Faculty Visiting Programme. He has served as a Senior PC/Area Chair/Associate Editor for AVSS 2012, ICPR 2018, IJCAI 2019, AAAI 2020, and BMVC from 2018 to 2019. He is an Associate Editor of Pattern Recognition. He was a recipient of the Excellent Young Scientists Fund of the National Natural Science Foundation of China, and the Royal Society-Newton Advanced Fellowship, U.K.
Wei-Shi Zheng received the Ph.D. degree in applied mathematics from Sun Yat-sen University in 2008. He is currently a Professor with Sun Yat-sen University. He has now published more than 110 papers, including more than 90 publications in main journals (TPAMI, TNN/TNNLS, TIP, and TSMC-B, PR) and top conferences (ICCV, CVPR, IJCAI, and AAAI). His research interests include person/object association and activity understanding in visual surveillance, and the related large-scale machine learning algorithm. He has joined the Microsoft Research Asia Young Faculty Visiting Programme. He has served as a Senior PC/Area Chair/Associate Editor for AVSS 2012, ICPR 2018, IJCAI 2019, AAAI 2020, and BMVC from 2018 to 2019. He is an Associate Editor of Pattern Recognition. He was a recipient of the Excellent Young Scientists Fund of the National Natural Science Foundation of China, and the Royal Society-Newton Advanced Fellowship, U.K.View more