Abstract:
Crowd attribute recognition is a challenging task for crowd video understanding because a crowd video often contains multiple attributes from various types. Traditional d...Show MoreMetadata
Abstract:
Crowd attribute recognition is a challenging task for crowd video understanding because a crowd video often contains multiple attributes from various types. Traditional deep learning-based methods directly treat this recognition problem as a multiple binary classification problem and represent the video by vectorizing and fusing the separately learned spatial and temporal features in the fully connected layers. Therefore, the correlations between these attributes may not be well captured. In this paper, a bidirectional recurrent prediction model with a semantic-aware attention mechanism is proposed to explore the spatio-temporal and semantic relations between the attributes for more accurate recognition. The ConvLSTM is introduced for feature representation to capture the spatio-temporal structure of the crowd videos and facilitate the visual attention. The bidirectional recurrent attention module is proposed for sequential attribute prediction by associating each subcategory attributes to corresponding semantic-related regions iteratively. The experiments and evaluations on the challenging WWW crowd video dataset not only show that our approach significantly outperforms the state-of-the-art methods but also verify that our approach can effectively capture the spatio-temporal and semantic relations of the crowd attributes.
Published in: IEEE Transactions on Circuits and Systems for Video Technology ( Volume: 30, Issue: 7, July 2020)
Funding Agency:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Predictor Of Recurrence ,
- Attribute Recognition ,
- Subcategories ,
- Feature Representation ,
- Multiple Dimensions ,
- Attention Mechanism ,
- Visual Attention ,
- Semantic Similarity ,
- Fully-connected Layer ,
- Recurrent Model ,
- Video Dataset ,
- Spatiotemporal Relationship ,
- Spatiotemporal Structure ,
- Bidirectional Model ,
- Feature Maps ,
- Recurrent Neural Network ,
- Multi-label ,
- Stock Market ,
- Temporal Information ,
- Hidden State ,
- Unidirectional Model ,
- Crowded Scenes ,
- Attention Block ,
- Video Features ,
- Appearance Features ,
- Score Map ,
- Attention Model ,
- Area Under Receiver Operating Characteristic Curve ,
- Attention Map ,
- Action Recognition
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Predictor Of Recurrence ,
- Attribute Recognition ,
- Subcategories ,
- Feature Representation ,
- Multiple Dimensions ,
- Attention Mechanism ,
- Visual Attention ,
- Semantic Similarity ,
- Fully-connected Layer ,
- Recurrent Model ,
- Video Dataset ,
- Spatiotemporal Relationship ,
- Spatiotemporal Structure ,
- Bidirectional Model ,
- Feature Maps ,
- Recurrent Neural Network ,
- Multi-label ,
- Stock Market ,
- Temporal Information ,
- Hidden State ,
- Unidirectional Model ,
- Crowded Scenes ,
- Attention Block ,
- Video Features ,
- Appearance Features ,
- Score Map ,
- Attention Model ,
- Area Under Receiver Operating Characteristic Curve ,
- Attention Map ,
- Action Recognition
- Author Keywords