Conferences >2024 IEEE 13th Global Confere...

Comparison of Large Pre-trained Models and Adaptation Methods for Japanese Dialects ASR

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In recent years, the accuracy of automatic speech recognition (ASR) for major languages has been greatly improved by pre-training methods using large spoken language reso...Show More

Metadata

Abstract:

In recent years, the accuracy of automatic speech recognition (ASR) for major languages has been greatly improved by pre-training methods using large spoken language resources. However, practical ASR technology has not yet been realized to cover the large and rich variety of regional dialects of the Japanese language. This study focuses on the adaptability of two state-of-the-art large pretrained models for building a unified ASR model for Japanese dialects. We present results from adapting these models using a total of several dozen hours of Japanese dialect speech. We compare models optimized for each dialect region, including dialect region identification, with models adapted without distinguishing between dialect regions. By comparing these two different learning processes, we investigate how various adaptation methods impact ASR performance for Japanese dialects.

Published in: 2024 IEEE 13th Global Conference on Consumer Electronics (GCCE)

Date of Conference: 29 October 2024 - 01 November 2024

Date Added to IEEE Xplore: 28 November 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/GCCE62371.2024.10760776

Conference Location: Kitakyushu, Japan

No metrics found for this document.

Contents

I. Introduction

In recent years, the construction of ASR systems utilizing large-scale pre-trained models has become mainstream. Models such as XLSR [1] and Whisper [2], trained on tens of thousands to hundreds of thousands of hours of multilingual speech, have led to rapid advancements in multilingual speech processing. However, for low-resource languages and dialects not included in the pre-training data, or included only in small quantities, the recognition accuracy is often not practical.

No metrics found for this document.

References is not available for this document.

Comparison of Large Pre-trained Models and Adaptation Methods for Japanese Dialects ASR

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Comparison of Large Pre-trained Models and Adaptation Methods for Japanese Dialects ASR

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

Authors

Figures

References

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?