By Topic

An Extremely Large Vocabulary Approach to Named Entity Extraction from Speech

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
T. Hori ; NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai, Seika-cho, Soraku-gun, Kyoto, Japan. ; A. Nakamura

This paper describes an approach to named entity (NE) extraction from speech data, in which an extremely large vocabulary lexicon including all NEs occurring in a large text corpus is used for automatic speech recognition (ASR). Accordingly, NEs appear in the recognition results just as they are. Our approach is implemented by the following steps: (1) run an NE-tagger for a whole text corpus and make an NE-tagged corpus in which each NE is padded with its category, (2) construct a lexicon and a language model for ASR using the tagged corpus where each NE is considered as a regular word, and (3) run the speech recognizer in one pass. Although a very large vocabulary is necessary to ensure a high coverage of NEs, that is no longer a major problem since we recently achieved real-time extremely large vocabulary ASR using a WEST framework. In experiments on NE extraction from spoken queries for an open-domain question-answering system, our approach yielded higher F-measure values than a conventional approach

Published in:

2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings  (Volume:1 )

Date of Conference:

14-19 May 2006