Skip to Main Content
The goal of this research is to evaluate the use of English stop word lists in Latent Semantic Indexing (LSI)-based Information Retrieval (IR) systems with large text datasets. Literature claims that the use of such lists improves retrieval performance. Here, three different lists are compared: two were compiled by IR groups at the University of Glasgow and the University of Tennessee, and one is our own list developed at the University of Northern British Columbia. We also examine the case where stop words are not removed from the input dataset. Our research finds that using tailored stop word lists improves retrieval performance. On the other hand, using arbitrary (non-tailored) lists or not using any list reduces the retrieval performance of LSI-based IR systems with large text datasets.