By Topic

Large vocabulary speech recognition of Slovenian language using morphological models

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Maucec, M. ; Electr. Eng. & Comput. Sci. Fac., Maribor Univ., Slovenia ; Rotovnik, T. ; Kacic, Z. ; Horvat, B.

This paper concerns the development of an automatic speech recognition system for the Slovenian language. The large number of unique words in inflected languages is identified as the primary reason for performance degradation. This article discusses statistical language models. A novel variation of the n-gram modelling theme is examined. Modelling units are chosen to be stems and endings instead of words. Only data-driven algorithms are employed to decompose words into stems and endings automatically. Significant reduction of OOV rate results when using stems and endings for modeling the Slovenian language. We also discuss corpus-based topic-adapted language models. Language models are most often used in a homogeneous topic environment. The problem of topic detection in highly inflected language is outlined, caused by the appearance of several word forms derived from the same lemma. The problem is solved by using data-driven algorithms to group words of the same lemma into classes.

Published in:

EUROCON 2003. Computer as a Tool. The IEEE Region 8  (Volume:2 )

Date of Conference:

22-24 Sept. 2003