Skip to Main Content
This paper presents a feature-transform based approach to unsupervised task adaptation and personalization for speech recognition. Given task-specific speech data collected from a deployed service, an “acoustic sniffing” module is built first by using a so-called i-vector technique with a number of acoustic conditions identified via i-vector clustering. Unsupervised maximum likelihood training is then performed to estimate a task-dependent feature transform for each acoustic condition, while pre-trained HMM parameters of acoustic models are kept unchanged. Given an unknown utterance, an appropriate feature transform is selected via “acoustic sniffing”, which is used to transform the feature vectors of the unknown utterance for decoding. The effectiveness of the proposed method is confirmed in a task adaptation scenario from a conversational telephone speech transcription task to a short message dictation task. The same method is expected to work for personalization as well.