Skip to Main Content
This paper proposes discriminative modeling in a high dimensional feature space for spoken document retrieval (SDR). To estimate the parameters of a high dimensional model properly, a large quantity of data is necessary, but there is no such large corpus for document retrieval. This paper employs two approaches to overcome this problem. One is a reranking approach. A baseline system first gives each document a score and then the score is compensated by employing a high dimensional model. The other approach is automatic query generation. A large number of queries are automatically generated and used for parameter estimation. Our experimental result shows that our proposed method can greatly improve SDR performance.