Skip to Main Content
This contribution focuses on the design of our automatic audio-visual TV broadcast news transcription system, where we would like to extend our Czech transcription system to use information from the visual signal of TV news video recordings. The subsystems for visual signal segmentation, for visual speaker identification and for visual voice activity detection are described here. These subsystems should help to develop our automatic audiovisual transcription system.
ELMAR, 2011 Proceedings
Date of Conference: 14-16 Sept. 2011