By Topic

Latent semantic indexing and large dataset: Study of term-weighting schemes

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
A N K Zaman ; Computer Science Program, University of Northern British Columbia (UNBC), Prince George, BC, Canada ; Charles Grant Brown

The primary purpose of an information retrieval (IR) system is to retrieve all the relevant documents, which are relevant to the user query. Latent Semantic Indexing/Analysis (LSI/LSA) based ad hoc document retrieval task investigates the performance of retrieval systems that search a static set of documents using new questions. Performance of LSI has been tested by others for several smaller datasets (e.g. MED, CISI abstracts) however, LSI has not been tested for a large dataset. So, we decided to test LSI for a very large dataset. We used TREC-8 LA Times dataset for our experimentation. We applied three different term weighting schemes and our own stop word list to judge the performance. Recall-precision graph and Coefficient of Variation (CV) were used to evaluate the retrieval performance of LSI based retrieval system. We found tf-idf term weighting scheme performs better than log-entropy and raw term frequency weighting schemes when the test collection became very large.

Published in:

Digital Information Management (ICDIM), 2010 Fifth International Conference on

Date of Conference:

5-8 July 2010