Abstract:
Research on Natural Language Processing (NLP) in Indonesian is still limited and the results of available research that can be used for further research are also limited....Show MoreMetadata
Abstract:
Research on Natural Language Processing (NLP) in Indonesian is still limited and the results of available research that can be used for further research are also limited. In a series of natural language processing, the initial step is parsing the sentence in a particular language based on the grammar in order to help understanding the meaning of a sentence. This research aims to produce a simulation of Indonesian parser by adapting the process which was conducted by using Collins Algorithm. The three main stages are: 1) preprocessing to generate corpus and events files, 2) lexical analysis to convert the corpus into tokens, and 3) syntax analysis to build parse tree that requires file events to calculate the probability of the grammar by count the occurrence frequency on file events to determine the best sentence trees. An evaluation was performed to the parser using 30 simple sentences and the outcomes were able to generate a corpus file, file events, parse-tree and probability calculations. Nevertheless some sentences could not be parsed completely true because of the limitations of the Tree bank file in Indonesian. Some future works are to develop complete and valid Tree bank and Lexicon files.
Date of Conference: 03-05 December 2013
Date Added to IEEE Xplore: 06 March 2014
Electronic ISBN:978-0-7695-5096-1