By Topic

Entropy-based indexing term for N-gram text search system

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

The purchase and pricing options are temporarily unavailable. Please try again later.
3 Author(s)
Yamamoto, H. ; Software Div., Hitachi Ltd., Osaka, Japan ; Ohmi, S. ; Tsuji, H.

N-gram indexing method is an algorithm for the full text search system where each index consists of serial N words or characters. While the system for Japanese text has the 2-gram characters index as base in order to save the volumes of the index file, the additional higher-gram index is expected to improve the performance. This paper presents the entropy-based method for selecting additional higher-gram index. The basic idea comes from the fact that the Katakana words (they have often the same prefix such as "in-" and "ex-" in English) are suitable for the incremental index.

Published in:

Systems, Man and Cybernetics, 2003. IEEE International Conference on  (Volume:5 )

Date of Conference:

5-8 Oct. 2003