Skip to Main Content
Recognition and retrieval are typically viewed as two cascaded independent modules for spoken term detection (STD). Retrieval techniques are assumed to be applied on top of automatic speech recognition (ASR) output, with performance depending on ASR accuracy. We propose a framework that integrates recognition and retrieval and consider them jointly in order to yield better STD performance. This can be achieved either by adjusting the acoustic model parameters (model-based) or by considering detected examples (example-based) using relevance information provided by the user (user relevance feedback) or inferred by the system (pseudo-relevance feedback), either for a given query (short-term context) or by taking into account many previous queries (long-term context). Such relevance feedback approaches have long been used in text information retrieval, but are rarely considered and cannot be directly applied to the retrieval of spoken content. The proposed relevance feedback approaches are specific to spoken content retrieval and are hence very different from those developed for text retrieval, which are applied only to text symbols. We present not only these relevance feedback scenarios and approaches for STD, but also propose a framework to integrate them all together. Preliminary experiments showed significant improvements in each case.
Audio, Speech, and Language Processing, IEEE Transactions on (Volume:20 , Issue: 7 )
Biometrics Compendium, IEEE
Date of Publication: Sept. 2012