By Topic

VQ-based model design algorithms for text compression

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Kim, S.P. ; Dept. of Electr. Eng., Polytechnic Univ., Brooklyn, NY, USA ; Ginesta, X.

Summary form only given. We propose a new approach for text compression where fast decoding is more desirable than encoding. An example of such a requirement is an information retrieval system. For efficient compression, high-order conditional probability information of text data is analyzed and modeled by utilizing vector quantization concept. Generally, vector quantization (VQ) has been used for lossy compression where the input symbol is not exactly recovered at the decoder, hence it does not seem applicable to lossless text compression problems. However, VQ can be applied to high-order conditional probability information so that the complexity of the information can be reduced. We represent the conditional probability information of a source in a tree structure where each node in the first level of the tree is associated with respective 1-st order conditional probability and the second level nodes with the 2nd order conditional probability. For good text compression performances, it is necessary that fourth or higher order conditional probability information be used. It is essential that the model be simplified enough for training with a reasonable size of training set. We reduce the number of conditional probability tables and also discuss a semi-adaptive operating mode of the model where the tree is derived through training but actual probability information at each node is obtained adaptively from input data. The performance of the proposed algorithm is comparable to or exceeds other methods such as prediction by partial matching (PPM) but requires smaller memory size

Published in:

Data Compression Conference, 1995. DCC '95. Proceedings

Date of Conference:

28-30 Mar 1995