Loading [MathJax]/extensions/MathMenu.js
A hybrid approach for Thai word segmentation with crowdsourcing feedback system | IEEE Conference Publication | IEEE Xplore

A hybrid approach for Thai word segmentation with crowdsourcing feedback system


Abstract:

This paper proposes a new hybrid method for Thai word segmentation using crowd-sourced dictionary integrated with word bi-gram model. The main dictionary is extracted int...Show More

Abstract:

This paper proposes a new hybrid method for Thai word segmentation using crowd-sourced dictionary integrated with word bi-gram model. The main dictionary is extracted into basic and compound word dictionaries to improve dictionary based algorithm performance. The word segmentation process begins with heuristic exhaustive matching algorithm using basic word dictionary to generate all possible basic word sequence candidates from an input string. Then, the best candidate is selected by word bi-gram model to solve ambiguity problem. Finally, the sequence of basic words is combined into compound words with compound word dictionary. Another part of this work is applying crowdsourcing paradigm. We implemented a web application for training bi-gram model and dictionary updates from user feedbacks. This process improves the lexical knowledge of the platform over the time. The algorithm was evaluated with two corpora. With InterBEST 2009 corpus, the proposed algorithm yields average precision, recall and f-measure at 97.52%, 97.70%, and 97.63%. With social network corpus, the proposed method yields average precision, recall and f-measure at 98.47%, 98.59%, and 98.54% respectively.
Date of Conference: 28 June 2016 - 01 July 2016
Date Added to IEEE Xplore: 08 September 2016
ISBN Information:
Conference Location: Chiang Mai, Thailand

Contact IEEE to Subscribe

References

References is not available for this document.