1. Introduction
Domain adaptation [1, 2, 3] is a process to build a domain-specific model given a base model and some target-domain data. This is one of the most important capabilities of artificial intelligence, such as speech and language processing, since different applications have different domains of interest (e.g. finance, sports, travel, etc.), and it is often desired to deploy a customized, domain-specific model for each of such applications. In this work, we study domain adaptation of speech recognition models where some text from the target domain is available; we have chosen this scenario since “paired” data (i.e. speech and its transcription) is not available in many of the practical use cases, due to the cost needed to manually transcribe speech, but text-only data is relatively easy to prepare [4,Section 6].