Skip to Main Content
In text compression, statistical context modeling aims to construct a model to calculate the probability distribution of a character based upon its context. The order-k context of a symbol is defined as the string formed by its preceding k symbols. This study introduces compressed context modeling, which defines the order-k context of a character as the sequence of k-bits composed of the entropy compressed representations of its preceding characters. While computing the compressed context of a symbol at some position in a given text, enough number of characters are involved in the compressed context so as to produce k-bits of information. Thus, instead of certain number of characters, certain amount of information is considered as the context of a character, and this property enables the prediction of each character to be performed with nearly uniform amount of information. Experiments are conducted to compare the proposed modeling against the classical fixed-length context definitions. The files in the large Calgary corpus are modeled with the newly introduced compressed context modeling and with the classical fixed-length context modeling. It is observed that on the average the statistical model with the proposed method uses 13.76 percent less space measured according to the number of distinct contexts, while providing 5.88 percent gain in empirical entropy measured by the information content as bits per character.
Data Compression Conference (DCC), 2011
Date of Conference: 29-31 March 2011