Abstract:
Recent studies have achieved remarkable results for action recognition with human skeletal data by utilizing graph convolutional models. Traditional approaches typically ...Show MoreMetadata
Abstract:
Recent studies have achieved remarkable results for action recognition with human skeletal data by utilizing graph convolutional models. Traditional approaches typically aggregate local spatio-temporal information bottom-up to form a single spatio-temporal global understanding. However, this method may fail to achieve multi-location perception for fine-grained action capture and may struggle to model subactions due to their varying durations. To solve these challenges, we design a Distributed Spatio-temporal Perception (DSP) module that innovatively treats each joint as an independent perception unit, performing joint-wise distributed multi-location perception in both spatial and temporal dimensions. In addition, we introduce an Anchor Pose-driven Subaction Encoding (APSE) module, which enhances informative clues for subaction reasoning through identifying the correlations between anchor pose and subactions formed by integrating distributed spatio-temporal perception features. Based on the above work, we propose a Joint-wise Distributed Perception Graph Convolutional Network (JDP-GCN). The experiments on three widely used datasets: NTU RGB+D 60, NTU RGB+D 120, and NW-UCLA, demonstrate that our method achieves state-of-the-art performance.
Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information: