Skip to Main Content
A typical news story contains a brief report by the anchor person(s) in the studio, as well as news footage in the field. Investigation shows that our recognizer performs better when indexing audio from the studio than that from the field. In order to automatically extract the "reliable" audio segments for speech retrieval, we attempt to detect studio-to-field transitions by means of video parsing. Our research is based on 146 news stories collected from Hong Kong TVB Jade station. Retrieval using the entire audio track gave (average inverse rank) AIR=0.759 while, with the incorporation of video parsing, we performed retrieval based only on the studio recordings, which produced AIR=0.765.
Signal Processing, 2002 6th International Conference on (Volume:2 )
Date of Conference: 26-30 Aug. 2002