Abstract:
This paper studies the ranges of acoustic and modulation frequencies of speech most relevant for identifying speakers and compares the speaker-specific information presen...Show MoreMetadata
Abstract:
This paper studies the ranges of acoustic and modulation frequencies of speech most relevant for identifying speakers and compares the speaker-specific information present in the temporal envelope against that present in the temporal fine structure. This study uses correlation and feature importance measures, random forest and convolutional neural network mod-els, and reconstructed speech signals with specific acoustic and/or modulation frequencies removed to identify the salient points. It is shown that the range of modulation frequencies associated with the fundamental frequency is more important than the 1–16 Hz range most commonly used in automatic speech recognition, and that the 0 Hz modulation frequency band contains significant speaker information. It is also shown that the temporal envelope is more discriminative among speakers than the temporal fine structure, but that the temporal fine structure still contains useful additional information for speaker identification. This research aims to provide a timely addition to the literature by identifying specific aspects of speech relevant for speaker identification that could be used to enhance the discriminant capabilities of machine learning models.
Published in: 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
Date of Conference: 14-17 December 2021
Date Added to IEEE Xplore: 03 February 2022
ISBN Information:
ISSN Information:
Conference Location: Tokyo, Japan