Abstract:
We developed a tool to create and evaluate a transcript and an alignment of an utterance. The tool will display speech waveform and MFCC features on HTML5 canvas. It also...Show MoreMetadata
Abstract:
We developed a tool to create and evaluate a transcript and an alignment of an utterance. The tool will display speech waveform and MFCC features on HTML5 canvas. It also shows transcript and phonemes alignment using PyKaldi and PyCHAIN. Maintainers of medical dictation systems will use this tool to examine speech waveform, MFCC features, transcription results, and phonemes alignment of an utterance in the evaluation process. PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit. At the same time, PyCHAIN is a fully parallelized PyTorch implementation of end-to-end lattice-free maximum mutual information (LF-MMI) training for the chain models in the Kaldi speech recognition toolkit. As a user guide, we demonstrate in this paper a use case for the operation of the tool's features to analyze the performance of the model by inspecting the transcript and the alignment of the utterance.
Published in: 2022 9th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE)
Date of Conference: 25-26 August 2022
Date Added to IEEE Xplore: 25 October 2022
ISBN Information: