Abstract:
Human pose estimation has long been motivated for its application in human behavior understanding and activity recognition. Despite recent advances in multi-person pose e...Show MoreMetadata
Abstract:
Human pose estimation has long been motivated for its application in human behavior understanding and activity recognition. Despite recent advances in multi-person pose estimation, existing solutions remain challenging in crowded scenes, especially in classroom scenarios where students are extremely overlapped and have different poses. In this paper, we focus on improving human pose estimation in crowded classrooms from the perspective of crowd detection and pose refinement. Specifically, we first follow a top-down strategy to detect persons in a multi-instance prediction manner and perform single-person pose estimation on each detected human region. Then, the pose estimation is refined with Transformer blocks by capturing the interactions among multiple persons in the image. Importantly, we replace self-attention in Transformer with a lightweight attention mechanism to reduce computational complexity. Quantitative and qualitative experiments demonstrate that our method remarkably outperforms previous methods with a clear margin on both standard benchmarks and self-collected classroom images.
Published in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 04-10 June 2023
Date Added to IEEE Xplore: 05 May 2023
ISBN Information: