Skip to Main Content
A new scheme is presented to detect a large number of keywords in voice controlled switchboard tasks. The new scheme is based on two stages. In the first stage, N-best syllable candidates with their corresponding acoustic scores are generated by an acoustic recognizer. In the second stage, a semantic model based parser is applied to determine the optimum keywords by searching through the lattice of N-best candidates. The experimental results show that when the spoken input deviates from the predefined syntactic constraints, the parser can also demonstrate high performance. For comparison purposes, the most common way to incorporate the syntactic knowledge of the task directly into the acoustic recognizer in the form of a finite state network is also investigated. Furthermore, to address the sparse-data problems, out-of-domain data in the form of newspaper text are used to obtain a more robust combined semantic model. The experiments show that the combined semantic model can improve the keywords detection rate from 90. 07% to 92. 91% when 80 ungrammatical sentences which do not conform to the task grammar are used as testing material.