Skip to Main Content
This paper describes a complex system developed for processing, indexing and accessing data collected in large audio and audio-visual archives that make an important part of Czech cultural heritage. Recently, the system is being applied to the Czech Radio archive, namely to its oral history segment with more than 200.000 individual recordings covering almost ninety years of broadcasting in the Czech Republic and former Czechoslovakia. The ultimate goals are a) to transcribe a significant portion of the archive - with the support of speech, speaker and language recognition technology, b) index the transcriptions, and c) make the audio and text files fully searchable. So far, the system has processed and indexed over 75.000 spoken documents. Most of them come from the last two decades, but the recent demo collection includes also a series of presidential speeches since 1934. The full coverage of the archive should be available by the end of 2014.