Journals & Magazines >IEEE Access >Volume: 11

Convolutional Transformer Fusion Blocks for Multi-Modal Gesture Recognition

Overall Architecture of our two-stream I3D fusion network with Convolutional Transformer Fusion Blocks (CTFB) at different levels of feature hierarchy. This picture shows...

Abstract:

Gesture recognition defines an important information channel in human-computer interaction. Intuitively, combining inputs from multiple modalities improves the recognitio...Show More

Metadata

Abstract:

Gesture recognition defines an important information channel in human-computer interaction. Intuitively, combining inputs from multiple modalities improves the recognition rate. In this work, we explore multi-modal video-based gesture recognition tasks by fusing spatio-temporal representation of relevant distinguishing features from different modalities. We present a self-attention based transformer fusion architecture to distill the knowledge from different modalities in two-stream convolutional neural networks (CNNs). For this, we introduce convolutions into the self-attention function and design the Convolutional Transformer Fusion Blocks (CTFB) for multi-modal data fusion. These fusion blocks can be easily added at different abstraction levels of the feature hierarchy in existing two-stream CNNs. In addition, the information exchange between two-stream CNNs along the feature hierarchy has so far been barely explored. We propose and evaluate different architectures for multi-level fusion pathways using CTFB to gain insights into the information flow between both streams. Our method achieves state-of-the-art or competitive performance on three benchmark gesture recognition datasets: a) IsoGD, b) NVGesture, and c) IPN hand. Extensive evaluation demonstrates the effectiveness of the proposed CTFB both in terms of recognition rate as well as resource efficiency.

Overall Architecture of our two-stream I3D fusion network with Convolutional Transformer Fusion Blocks (CTFB) at different levels of feature hierarchy. This picture shows...

Published in: IEEE Access ( Volume: 11)

Page(s): 34094 - 34103

Date of Publication: 31 March 2023

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2023.3263812

Contents

References is not available for this document.

Convolutional Transformer Fusion Blocks for Multi-Modal Gesture Recognition

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Convolutional Transformer Fusion Blocks for Multi-Modal Gesture Recognition

Alerts

Abstract:

Metadata

Abstract:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?