Loading [MathJax]/extensions/MathZoom.js
Performance Analysis of Apache Hadoop for Generating Candidates of Acronym and Expansion Pairs and Their Numerical Features | IEEE Conference Publication | IEEE Xplore

Performance Analysis of Apache Hadoop for Generating Candidates of Acronym and Expansion Pairs and Their Numerical Features


Abstract:

Mining information from web pages to automatically determine the acronym and its expansion is an important and challenging task. It is considered not easy to do because t...Show More

Abstract:

Mining information from web pages to automatically determine the acronym and its expansion is an important and challenging task. It is considered not easy to do because the acronyms writing rules and forms are very diverse for each language. The task consists of several important steps i.e. generating candidates of an acronym and its expansion, creating their numerical features, and determining the correct acronyms and expansions using machine learning algorithm. In this work, we evaluate and compare the performance analysis, in terms of speed, to generate the candidates of acronym and expansion pairs and create their numerical features when the processes are running on Hadoop cluster with different data nodes and on a single machine. The results show that Hadoop cluster outperforms a single machine when generating almost 52 million candidates of acronym and expansion pair and their numerical features. When Hadoop cluster was set up from two to three data nodes, the performance improved on average by 65.81%.
Date of Conference: 13-14 November 2018
Date Added to IEEE Xplore: 23 May 2019
ISBN Information:
Conference Location: Yogyakarta, Indonesia

Contact IEEE to Subscribe

References

References is not available for this document.