Conferences >ICASSP 2024 - 2024 IEEE Inter...

CoSLR: Contrastive Chinese Sign Language Recognition with prior knowledge And Multi-Tasks Joint Learning

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Perceiving by computer vision, Sign Language Recognition (SLR) obtains the advantage of transforming the posture video into a sentence, compared with the methods of senso...Show More

Metadata

Abstract:

Perceiving by computer vision, Sign Language Recognition (SLR) obtains the advantage of transforming the posture video into a sentence, compared with the methods of sensors to collect signals. However, learning representative features from a multimodal perspective is challenging. To this end, this study proposes a multi-task joint learning framework termed Contrastive Learning-based Sign Language Recognition Network (CoSLR) for Chinese sign language, which embeds text representation into the general video-based framework of SLR. In virtue of the profound ability of the pre-trained multimodal encoders, they are employed as processing modules to extract features from the original input video and text. Then, a contrastive learning between the video representation and corresponding token embedding is utilized to the feature extractor training. Finally, the linear combination of contrast and cross-entropy loss functions drives the end-to-end network to converge. Experiments show that the 1.27% WER of CoSLR has outperformed the state-of-the-art works in the comparison.

Published in: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 14-19 April 2024

Date Added to IEEE Xplore: 18 March 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP48485.2024.10445862

Conference Location: Seoul, Korea, Republic of

Funding Agency:

Contents

1. INTRODUCTION

In contrast with the communication between normal people in daily life, sign language almost plays an irreplaceable role in the deaf community, which can convey meaning through visualized signals with a series of coherent gestural movements and facial expressions. Computer vision-based continuous Sign Language Recognition (SLR) can extract visual features from the original input and recognize the related sign language glosses [1], [2]. As a cutting-edge de facto tool, deep learning allows multi-layer networks to be fed with preprocessed vectors and to automatically extract rules, which is more effective in recognizing images, video, speech, and audio [3]. Therefore, it should be taken for granted that Deep Neural Networks (DNN) have dramatically brought about breakthroughs in continuous SLR [4].

References is not available for this document.

CoSLR: Contrastive Chinese Sign Language Recognition with prior knowledge And Multi-Tasks Joint Learning

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. INTRODUCTION

References

IEEE Account

Purchase Details

Profile Information

Need Help?

CoSLR: Contrastive Chinese Sign Language Recognition with prior knowledge And Multi-Tasks Joint Learning

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. INTRODUCTION

Authors

Figures

References

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?