Skip to Main Content
N-gram indexing method is an algorithm for the full text search system where each index consists of serial N words or characters. While the system for Japanese text has the 2-gram characters index as base in order to save the volumes of the index file, the additional higher-gram index is expected to improve the performance. This paper presents the entropy-based method for selecting additional higher-gram index. The basic idea comes from the fact that the Katakana words (they have often the same prefix such as "in-" and "ex-" in English) are suitable for the incremental index.