The zero-frequency problem: estimating the probabilities of novelevents in adaptive text compression
Witten, I.H.; Bell, T.C.
Information Theory, IEEE Transactions on
Volume 37, Issue 4, Jul 1991 Page(s):1085 - 1094
Digital Object Identifier 10.1109/18.87000
Summary:Approaches to the zero-frequency problem in adaptive text
compression are discussed. This problem relates to the estimation of the
likelihood of a novel event occurring. Although several methods have
been used, their suitability has been on empirical evaluation rather
than a well-founded model. The authors propose the application of a
Poisson process model of novelty. Its ability to predict novel tokens is
evaluated, and it consistently outperforms existing methods. It is
applied to a practical statistical coding scheme, where a slight
modification is required to avoid divergence. The result is a
well-founded zero-frequency model that explains observed differences in
the performance of existing methods, and offers a small improvement in
the coding efficiency of text compression over the best method
previously known
View citation and abstract |