Abstract:
In recent years, the accuracy of automatic speech recognition (ASR) for major languages has been greatly improved by pre-training methods using large spoken language reso...Show MoreMetadata
Abstract:
In recent years, the accuracy of automatic speech recognition (ASR) for major languages has been greatly improved by pre-training methods using large spoken language resources. However, practical ASR technology has not yet been realized to cover the large and rich variety of regional dialects of the Japanese language. This study focuses on the adaptability of two state-of-the-art large pretrained models for building a unified ASR model for Japanese dialects. We present results from adapting these models using a total of several dozen hours of Japanese dialect speech. We compare models optimized for each dialect region, including dialect region identification, with models adapted without distinguishing between dialect regions. By comparing these two different learning processes, we investigate how various adaptation methods impact ASR performance for Japanese dialects.
Date of Conference: 29 October 2024 - 01 November 2024
Date Added to IEEE Xplore: 28 November 2024
ISBN Information: