Skip to Main Content
Automatic key phrase extraction is the task of automatically selecting a set of phrases that describe the content of a simple sentence. That a key phrase is extracted means that it is present verbatim in the sentence to which it is assigned. Accurate key phrase extraction is fundamental to the success of many recent digital library applications, clustering, and semantic information retrieval techniques. The present research discusses a support vector machines (SVMs) approach for Vietnamese key phrase extraction and presents a number of experiments in which performance is incrementally improved. In general, the Vietnamese key phrase extracting process consists of three steps: word segmentation for identifying lexical units in an input sentence, part-of-speech tagging for words, and key phrase extraction for phrases. The performance of Vietnamese key phrase extraction systems is generally measured by the precision rate attained. This depends strongly on the nature and the size of a training set of key phrases. Most results are superior to 70.30% with a training set of 9,000 Vietnamese key phrases with of 2,000 sentences which was selected from the corpus of Vietnamese Lexicography Center (www.vietlex.com.vn).