Scheduled System Maintenance:
On Monday, April 27th, IEEE Xplore will undergo scheduled maintenance from 1:00 PM - 3:00 PM ET (17:00 - 19:00 UTC). No interruption in service is anticipated.
By Topic

Compressed Context Modeling for Text Compression

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Kulekci, M.O. ; Nat. Res. Inst. of Electron. & Cryptology, TUBITAK-BILGEM-UEKAE, Turkey

In text compression, statistical context modeling aims to construct a model to calculate the probability distribution of a character based upon its context. The order-k context of a symbol is defined as the string formed by its preceding k symbols. This study introduces compressed context modeling, which defines the order-k context of a character as the sequence of k-bits composed of the entropy compressed representations of its preceding characters. While computing the compressed context of a symbol at some position in a given text, enough number of characters are involved in the compressed context so as to produce k-bits of information. Thus, instead of certain number of characters, certain amount of information is considered as the context of a character, and this property enables the prediction of each character to be performed with nearly uniform amount of information. Experiments are conducted to compare the proposed modeling against the classical fixed-length context definitions. The files in the large Calgary corpus are modeled with the newly introduced compressed context modeling and with the classical fixed-length context modeling. It is observed that on the average the statistical model with the proposed method uses 13.76 percent less space measured according to the number of distinct contexts, while providing 5.88 percent gain in empirical entropy measured by the information content as bits per character.

Published in:

Data Compression Conference (DCC), 2011

Date of Conference:

29-31 March 2011