Skip to Main Content
With the advent of essentially unlimited data storage capabilities and with the proliferation of the use of the Internet, it becomes reasonable to imagine a world in which it would be possible to access any of the stored information at will with a few keystrokes or voice commands. Since much of this data will be in the form of speech from various sources, it becomes important to develop the technologies necessary for indexing and browsing such audio data. This paper describes some of the requisite speech and language technologies that would be required and introduces an effort aimed at integrating these technologies into a system, called Rough 'n' Ready, which indexes speech data, creates a structural summarization, and provides tools for browsing the stored data. The technologies highlighted in the paper include speaker-independent continuous speech recognition, speaker segmentation and identification, name spotting, topic classification, story segmentation, and information retrieval. The system automatically segments the continuous audio input stream by speaker, clusters audio segments from the same speaker, identifies speakers known to the system, and transcribes the spoken words. It also segments the input stream into stories, based on their topic content, and locates the names of persons, places, and organizations. These structural features are stored in a database and are used to construct highly selective search queries for retrieving specific content from large audio archives.