Abstract:
Transformer-based approaches have exhibited outstanding performances in the field of human-object interaction (HOI) detection. However, these approaches rely on underlyin...Show MoreMetadata
Abstract:
Transformer-based approaches have exhibited outstanding performances in the field of human-object interaction (HOI) detection. However, these approaches rely on underlying object detectors that have undergone large-scale pre-trainings on the ImageNet and MS-COCO dataset. This limits the potential of unique architectural designs and induces a learning bias, causing ineffective HOI representation learning. In this paper, we propose ScratchHOI, a transformer-based method for human-object interaction detection that can be trained from scratch, eliminating the need for pre-trained object detectors. ScratchHOI employs dynamic and static affinity-based feature aggregation for processing local and long-range visual information. Additional techniques are also employed to improve detection performance, such as dynamic and interactive anchor refinement for objects and interactions. Experiments on the HICO-Det dataset show that ScratchHOI achieves competitive performance against other state-of-the-art approaches over a variety of different evaluation measures.
Date of Conference: 08-11 October 2023
Date Added to IEEE Xplore: 11 September 2023
ISBN Information:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Human-object Interaction ,
- Interactive ,
- Dynamic Characteristics ,
- Object Detection ,
- Aggregation Kinetics ,
- Field Performance ,
- Interaction Field ,
- Learning Bias ,
- Long-range Information ,
- Improve Detection Performance ,
- MS COCO Dataset ,
- Transformer-based Methods ,
- Convolutional Neural Network ,
- Decoding ,
- Convolutional Layers ,
- Bounding Box ,
- Feed-forward Network ,
- Learnable Parameters ,
- Dynamic Information ,
- Backbone Network ,
- Semantic Embedding ,
- Transformer Decoder ,
- Static Information ,
- Multi-scale Features ,
- Linear Layer ,
- Interaction Prediction ,
- Transformer Encoder
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Human-object Interaction ,
- Interactive ,
- Dynamic Characteristics ,
- Object Detection ,
- Aggregation Kinetics ,
- Field Performance ,
- Interaction Field ,
- Learning Bias ,
- Long-range Information ,
- Improve Detection Performance ,
- MS COCO Dataset ,
- Transformer-based Methods ,
- Convolutional Neural Network ,
- Decoding ,
- Convolutional Layers ,
- Bounding Box ,
- Feed-forward Network ,
- Learnable Parameters ,
- Dynamic Information ,
- Backbone Network ,
- Semantic Embedding ,
- Transformer Decoder ,
- Static Information ,
- Multi-scale Features ,
- Linear Layer ,
- Interaction Prediction ,
- Transformer Encoder
- Author Keywords