By Topic

Use of latent semantic indexing to identify name variants in large data collections

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Bradford, R.B. ; Agilex Technol., Chantilly, VA, USA

In many intelligence and security informatics applications, named entities constitute a particularly important element of queries and analytic operations. In such applications, variations in the rendering of entity names present a pervasive problem. The problem is most frequently encountered when dealing with names of persons. For person names, a wide variety of factors may lead to variations: use of nicknames, differences in given name / surname order, misspellings, phonetic renderings, use of different transliteration systems, etc. Historically, a number of methods have been developed for generating possible name variants. Most of these have been based on phonetic similarities, edit distance, or longest common substrings. However, in general, the larger the data collection, the less effective these techniques are. This paper presents an approach to attaining both high precision and high recall for name variant identification in large text collections. The approach exploits the technique of latent semantic indexing (LSI). In this approach, the contextual information provided by LSI allows likely true variants to be selected from multiple candidate variants generated by other techniques. This significantly improves the precision of candidate name variant results. This paper describes a basic LSI-augmented approach to name variant identification, as well as a new approach that yields additional precision improvements.

Published in:

Intelligence and Security Informatics (ISI), 2013 IEEE International Conference on

Date of Conference:

4-7 June 2013