Audio-visual large vocabulary continuous speech recognition in thebroadcast domain
Basu, S.
Neti, C.
Rajput, N.
Senior, A.
Subramaniam, L.
Verma, A.
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY;
This paper appears in: Multimedia Signal Processing, 1999 IEEE 3rd Workshop on
Publication Date: 1999
On page(s): 475-481
Meeting Date: 09/13/1999 - 09/15/1999
Location: Copenhagen, Denmark
ISBN: 0-7803-5610-1
References Cited: 15
INSPEC Accession Number: 6497360
Digital Object Identifier: 10.1109/MMSP.1999.793893
Current Version Published: 2002-08-06
Abstract
Considers the problem of combining visual cues with audio signals
for the purpose of improved automatic machine recognition of speech.
Although significant progress has been made in the machine transcription
of large-vocabulary continuous speech (LVCSR) over the last few years,
the technology to date is most effective only under controlled
conditions, such as low noise, speaker-dependent recognition, read
speech (as opposed to conversational speech), etc. On the other hand,
while augmenting the recognition of speech utterances with visual cues
has attracted the attention of researchers over the last couple of
years, most efforts in this domain can be considered to be only
preliminary in the sense that, unlike LVCSR efforts, tasks have been
limited to small vocabularies (e.g. commands, digits) and often to
speaker-dependent training or isolated word speech, where word
boundaries are artificially well-defined
Index
Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.