Abstract:
The advancements in digital technologies have led to the generation of huge amounts of data. This data can be in the form of natural language text which is easy to interp...Show MoreMetadata
Abstract:
The advancements in digital technologies have led to the generation of huge amounts of data. This data can be in the form of natural language text which is easy to interpret for human but not for machines. Extracting information from this data in a form which is understandable by machines plays an important role in facilitating the processing of the huge volume of data. In this paper, we present a computationally more efficient approach to extract and classify the named entities. This approach uses rule based learning in combination with regular expression and pattern matching to extract the entities from the given text. Once the extraction is completed the named entities are classified into three classes (person, place organization). In order to make this accurate and faster, we apply an enhanced version of the K-Means algorithm to the entities extracted in the previous stage. After the clusters have been formed we identify the cluster labels according to the properties of the cluster members. This clustering helps improve the speed and accuracy of the NER tagging.
Published in: 2018 Second International Conference on Advances in Computing, Control and Communication Technology (IAC3T)
Date of Conference: 21-23 September 2018
Date Added to IEEE Xplore: 28 March 2019
ISBN Information: