Skip to Main Content
This paper examines a query-by-example approach to spoken term detection in audio files. The approach is designed for low-resource situations in which limited or no in-domain training material is available and accurate word-based speech recognition capability is unavailable. Instead of using word or phone strings as search terms, the user presents the system with audio snippets of desired search terms to act as the queries. Query and test materials are represented using phonetic posteriorgrams obtained from a phonetic recognition system. Query matches in the test data are located using a modified dynamic time warping search between query templates and test utterances. Experiments using this approach are presented using data from the Fisher corpus.