Loading [MathJax]/extensions/MathMenu.js
Adaptive Time-Scale Modification for Improving Speech Intelligibility Based On Phoneme Clustering For Streaming Services | IEEE Conference Publication | IEEE Xplore

Adaptive Time-Scale Modification for Improving Speech Intelligibility Based On Phoneme Clustering For Streaming Services


Abstract:

Time-scale modification (TSM) is important in streaming services, including over-the-top (OTT) platforms, audiobooks, and online lectures. Although TSM modifies the speed...Show More

Abstract:

Time-scale modification (TSM) is important in streaming services, including over-the-top (OTT) platforms, audiobooks, and online lectures. Although TSM modifies the speed of audio while maintaining other audio attributes such as the pitch and timbre of the speaker, it unnaturally distorts audio signals and makes spoken content difficult to understand. This study proposes an adaptive time-scale modification algorithm (ATSM); that adaptively varies the speaking rate for each phoneme cluster of speech to improve speech intelligibility. The proposed algorithm performs forced alignment using Montreal forced aligner and time-scale reconstruction using an adaptive speaking rate based on dynamic time warping. To validate the proposed algorithm, the diagnostic rhyme test (DRT) score, comparison mean opinion score (CMOS), and fast dynamic time warping (FastDTW) score of ATSM are compared with those of conventional TSMs. The results show that the speech compressed with the proposed algorithm has improved speech intelligibility than that of speech compressed with other algorithms.
Date of Conference: 04-10 June 2023
Date Added to IEEE Xplore: 05 May 2023
ISBN Information:

ISSN Information:

Conference Location: Rhodes Island, Greece

Contact IEEE to Subscribe

References

References is not available for this document.