Loading [MathJax]/extensions/MathMenu.js
Stopword Identification and Removal Techniques on TC and IR applications: A Survey | IEEE Conference Publication | IEEE Xplore

Stopword Identification and Removal Techniques on TC and IR applications: A Survey


Abstract:

The concept of “Stopword” was first introduced by H.P. Luhn in 1958. In Natural Language Processing (NLP), Stop word is a common word that is neither indexed nor searchab...Show More

Abstract:

The concept of “Stopword” was first introduced by H.P. Luhn in 1958. In Natural Language Processing (NLP), Stop word is a common word that is neither indexed nor searchable in a computer search engine. Example of stop words are `a', `the', `is' etc. Removing stopword is Pre-processing step in majority of NLP applications, including IR (Information Retrieval) and TC (Text Classification). Some of the benefits of removing stop word are - decrease in size of corpus by 35-45%, improvement of efficiency and accuracy of the text mining applications thus helping in reduction of time and space complexity of overall application. In this paper, we discuss the various major stopword identification techniques used by the researchers in last few decades, for Indian Language and Non-Indian Languages. Also, we present a survey of methods used for stopword list generation with their characteristics. We have also mentioned the effect of various stopword removal techniques applied on TC and IR application domains. A comprehensive list of resources publicly available for static stop words in various languages is also given for quick reference.
Date of Conference: 06-07 March 2020
Date Added to IEEE Xplore: 23 April 2020
ISBN Information:

ISSN Information:

Conference Location: Coimbatore, India

Contact IEEE to Subscribe

References

References is not available for this document.