Skip to Main Content
This paper is concerned with the question of how to automatically obtain initial estimates describing speech segments as specified by a phonetic transcription or script. An algorithm has been developed that is able to perform a completely automatic segmentation of the speech signal with no human interaction by making use of only the script. The segment labels of the script can represent any type of phonetically based units embedded in continuous speech, such as whole words, demisyllables, phoneme-like or subphonetic segments. The quantitative formulation of the problem is chosen in such a way that a sort of closed-form solution can be obtained by dynamic programming. The resulting algorithm finds a segmentation that provides the best compromise between the acoustic observations and the requirements of the script. The algorithm was applied to two tasks. First, it was used to determine the word boundaries in word strings with no delimited reference patterns. The second task of the algorithm was to provide an initial segmentation of speech into phoneme-like and subphonetic segments.