Abstract:
We study the problem of building a domain-specific speech recognition model given some text from the target domain. One of the most popular approaches to this problem is ...Show MoreMetadata
Abstract:
We study the problem of building a domain-specific speech recognition model given some text from the target domain. One of the most popular approaches to this problem is shallow fusion, which incorporates a domain-specific language model build from the given text. However, shallow fusion significantly increases the model size and inference cost, which makes its deployment harder. In this paper, we propose domain adaptation by data distribution matching, where a subset is selected from an existing multi-domain training data to match the target-domain distribution, and a model is fine-tuned on the subset. A submodular optimization algorithm with a novel extension is employed for the subset selection. Experiments on LibriSpeech, a corpus of audiobooks, where we treat each book as a domain, show that the proposed distribution-matching approach achieves WERs equivalent with the conventional shallow-fusion approach, without any increase in the model size and inference cost.
Date of Conference: 16-20 December 2023
Date Added to IEEE Xplore: 19 January 2024
ISBN Information: