Abstract:
Mining information from web pages to automatically determine the acronym and its expansion is an important and challenging task. It is considered not easy to do because t...Show MoreMetadata
Abstract:
Mining information from web pages to automatically determine the acronym and its expansion is an important and challenging task. It is considered not easy to do because the acronyms writing rules and forms are very diverse for each language. The task consists of several important steps i.e. generating candidates of an acronym and its expansion, creating their numerical features, and determining the correct acronyms and expansions using machine learning algorithm. In this work, we evaluate and compare the performance analysis, in terms of speed, to generate the candidates of acronym and expansion pairs and create their numerical features when the processes are running on Hadoop cluster with different data nodes and on a single machine. The results show that Hadoop cluster outperforms a single machine when generating almost 52 million candidates of acronym and expansion pair and their numerical features. When Hadoop cluster was set up from two to three data nodes, the performance improved on average by 65.81%.
Published in: 2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE)
Date of Conference: 13-14 November 2018
Date Added to IEEE Xplore: 23 May 2019
ISBN Information: