Skip to Main Content
Aliases discovered in Thai articles are challenging. We apply a standard vector space model to explore and match aliases with formal names or each others. On first construct a term-by-document matrix (TDM), which contains term frequency of term occurring in document collection assuming that all terms exist in the typed named entity dictionary. Normalization techniques are used instead of standard weighting functions to reduce the gap among related terms; alternatively increase the gap of unrelated terms. The matrix decomposition algorithm decomposes the term-by-document matrix to form the left singular vectors which projects term properties. We finally create a correlation matrix to represent term relations. The empirical results show that this technique is appropriate in discovering aliases in highly sparse matrix.