Skip to Main Content
This is a non-technical paper describing how and why we organized BEST 2009, the first contest in the series of ldquobenchmark for enhancing the standard of Thai language processingrdquo, which is expected to help accelerate the progress of the natural language processing technology in Thailand by assembling 3 essential components: common standards, resources and researchers. The BEST 2009 : Thai word segmentation software contest is the first shared task on Thai NLP that exercised this assemblage and aimed to find the best algorithms that could correctly divide Thai non-segmented script into words according to the guidelines previously prepared by experts from several research institutes and universities. Thai word-segmented corpora of 5 million words have been developed as a training set, another 600 K as a test set. The evaluation procedure and protocol have been designed. The process and the results of the contest are reported.