Skip to Main Content
This paper focuses on the integration of non-word acoustic events into LVCSR (Large Vocabulary Continuous Speech Recognition). Non-word events may have an important role in cognitive, paraverbal infocommunication; however, they often are not modeled explicitly due to computational difficulties. In our experiments a serial and a loopback WFST (Weighted Finite State Transducer) architecture was built to recognize and/or print out certain non-word events on the output. We have used a Hungarian Broadcast News corpus to evaluate the results. No performance degradation was observed in terms of normal word recognition accuracy as compared to the baseline, where no non-word event modeling was applied. The non-word event recognition accuracy was, however, lower than expected. One of the most likely reasons may be the less consistent manual transcription as compared to the normal words. Nonetheless, some of the non-word events were mostly correctly recognized. The loopback architecture has higher memory requirement, but gives significantly better non-word event accuracies, without any increase of recognition time.