Abstract:
A review of techniques to identify speakers from their voices is presented, noting strengths and weaknesses of various methods. Similar acoustic analysis has been often u...Show MoreMetadata
Abstract:
A review of techniques to identify speakers from their voices is presented, noting strengths and weaknesses of various methods. Similar acoustic analysis has been often used for both speech and speaker recognition, despite the two tasks being quite different. Speaker biometrics from voice is far more indirect and subtle than the estimation of phoneme sequences for automatic speech recognition from periodic evaluations of the spectral envelope of the vocal tract output. Speech signals are discussed from the point of view of how to recognize their textual content versus estimating other aspects of speakers. Common speech analysis methods such as filter banks, linear prediction, and mel-frequency cepstrum are examined. Approaches such as hidden Markov models, i-vectors, and artificial neural networks are shown to be useful for multiple speech applications. Focus is on how various types of networks can accomplish automatic speaker verification (ASV). Suggestions to improve these methods are made.
Published in: IEEE/ACM Transactions on Audio, Speech, and Language Processing ( Volume: 32)