Skip to Main Content
This study presents a novel approach to spoken document retrieval based on multilevel knowledge indexing and semantic verification. Multilevel knowledge indexing considers three information sources, namely transcription data, keywords extracted from spoken documents, and hypernyms of the extracted keywords. A semantic network with forward-backward propagation is presented for semantic verification of the retrieved documents. In the forward step for semantic verification, a bag of keywords is chosen based on word significance measures. Semantic relations are estimated and adopted for verification in the backward procedure. The verification score is then utilized to weight and rerank the retrieved documents to obtain the final results. Experiments are performed on 40 h of anchor speech extracted from 198 h of collected broadcast news. Experimental results indicate that multilevel knowledge indexing and semantic verification achieve better retrieval results than other indexing schemes.