Abstract:
Time-scale modification (TSM) is important in streaming services, including over-the-top (OTT) platforms, audiobooks, and online lectures. Although TSM modifies the speed...Show MoreMetadata
Abstract:
Time-scale modification (TSM) is important in streaming services, including over-the-top (OTT) platforms, audiobooks, and online lectures. Although TSM modifies the speed of audio while maintaining other audio attributes such as the pitch and timbre of the speaker, it unnaturally distorts audio signals and makes spoken content difficult to understand. This study proposes an adaptive time-scale modification algorithm (ATSM); that adaptively varies the speaking rate for each phoneme cluster of speech to improve speech intelligibility. The proposed algorithm performs forced alignment using Montreal forced aligner and time-scale reconstruction using an adaptive speaking rate based on dynamic time warping. To validate the proposed algorithm, the diagnostic rhyme test (DRT) score, comparison mean opinion score (CMOS), and fast dynamic time warping (FastDTW) score of ATSM are compared with those of conventional TSMs. The results show that the speech compressed with the proposed algorithm has improved speech intelligibility than that of speech compressed with other algorithms.
Published in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 04-10 June 2023
Date Added to IEEE Xplore: 05 May 2023
ISBN Information: