Skip to Main Content
The main objectives of this research is to investigate whether by using Latent Semantic Indexing (LSI) will improve the retrieval effectiveness on Malay document, compared to by using exact term-matching technique. LSI is a mathematical approach that uses Singular Value Decomposition (SVD) to discover the important association of the relationship between terms and terms, terms and documents and documents and documents. Cosine similarity measurement is used to measure the similarity between the query word and terms as well as the documents. This research uses Malay Language Test Collection consisting of 210 Malay documents, queries, relevant judgment and Malay stemmer to stem Malay terms. Results and analyses show that, LSI retrieval method outperformed the exact term-matching technique despite the longer processing time it took during the indexing. The best result for retrieval effectiveness for Malay documents in this domain is achieved when k-dimension is 4 and the threshold value is 0.8, which is 80.2 percent.