By Topic

Automatic Keyword Extraction Using Linguistic Features

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Xinghua Hu ; University of California, Santa Cruz ; Bin Wu

This paper describes a novel keyword extraction algorithm position weight (PW) that utilizes linguistic features to represent the importance of the word position in a document. Topical terms and their previous-term and next-term co-occurrence collections are extracted. To measure the degree of correlation between a topical term and its co-occurrence terms, three methods are employed including term frequency inverse term frequency (TFITF), position weight inverse position weight (PWIPW), and CHI-square (chi2). The co-occurrence terms that have the highest degree of correlation and exceed a co-occurrence frequency threshold are combined together with the original topical term to form a final keyword. With the linear computational complexity of the algorithm, the vector space of documents in a large corpus or boundless Web can be quickly represented by sets of keywords, which makes it possible to retrieve large-scale information fast and effectively

Published in:

Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06)

Date of Conference:

Dec. 2006