Abstract:
To fasten treebank construction, it is necessary to design an integrated annotation tool that includes word segmenter, sentence parser for initial tree suggestion, tree v...Show MoreMetadata
Abstract:
To fasten treebank construction, it is necessary to design an integrated annotation tool that includes word segmenter, sentence parser for initial tree suggestion, tree visualizer, tree-structure editor, and collaborative functions. In the past, existing tools did not consider an integrated platform that provides preprocessing, automated or semi-automated mechanism for parse tree suggestion, as well as tagged corpus data management. This paper presents a so-called CF Planter, a toolset for semi-automatic Thai treebank construction that consist of word segmenter, part-of-speech tagger, statistical parser, a web-based GUI for syntactic tree refinement and management. Given an input sentence, its most likely syntactic tree is automatically suggested and visualized to an annotator for manual correction before adding into the treebank repository. Whenever a new syntactic tree is appended into the treebank, the treebank repository is iteratively refined by computing a set of newly revised grammar rules based on revised probabilities. Toolset is performed to severally illustrate with grammar frequencies. The toolset facilitates annotators to easily tag tree structure for an input sentence. Finally, the process of automatic suggestion of syntactic tree is evaluated.
Date of Conference: 07-09 May 2018
Date Added to IEEE Xplore: 23 August 2018
ISBN Information: