Abstract:
In this paper the CLARIN-PHONEREC toolset for the conversion of an acoustic corpus into the loadable set of Python binary arrays is described. The aim of the tools is to ...Show MoreMetadata
Abstract:
In this paper the CLARIN-PHONEREC toolset for the conversion of an acoustic corpus into the loadable set of Python binary arrays is described. The aim of the tools is to make it easy to start experimenting with automatic speech recognition using Python and neural networks libraries. To make initial experiments even easier, apart from the programs that convert any acoustic corpus to the binary shape, also the ready to use datasets for speech recognition experiments in Polish are provided and described here. By using provided tools and datasets, researchers are free of heavy labour related to speech sample conversion to train, development and test datasets. The provided dataset is created from the one of the biggest publicly available Polish speech corpora: CLARIN-PL. The tools presented in this paper can be also used to prepare phoneme recognition training and testing data in other languages, e.g. from the popular TIMIT dataset. In this way, experiment results for Polish can be compared to analogous results obtained for English.
Date of Conference: 04-06 July 2018
Date Added to IEEE Xplore: 13 August 2018
ISBN Information: