Conferences >Proceedings DCC '97. Data Com...

Symbol ranking text compressors

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Summary form only given. In 1951 Shannon estimated the entropy of English text by giving human subjects a sample of text and asking them to guess the next letters. He fou...Show More

Metadata

Abstract:

Summary form only given. In 1951 Shannon estimated the entropy of English text by giving human subjects a sample of text and asking them to guess the next letters. He found, in one example, that 79% of the attempts were correct at the first try, 8% needed two attempts and 3% needed 3 attempts. By regarding the number of attempts as an information source he could estimate the language entropy. Shannon also stated that an "identical twin" to the original predictor could recover the original text and these ideas are developed here to provide a new taxonomy of text compressors. In all cases these compressors recode the input into "rankings" of "most probable symbol", "next most probable symbol", and so on. The rankings have a very skew distribution (low entropy) and are processed by a conventional statistical compressor. Several "symbol ranking" compressors have appeared in the literature, though seldom with that name or even reference to Shannon's work. The author has developed a compressor which uses constant-order contexts and is based on a set-associative cache with LRU update. A software implementation has run at about 1 Mbyte/s with an average compression of 3.6 bits/byte on the Calgary Corpus.

Published in: Proceedings DCC '97. Data Compression Conference

Date of Conference: 25-27 March 1997

Date Added to IEEE Xplore: 06 August 2002

Print ISBN:0-8186-7761-9

Print ISSN: 1068-0314

DOI: 10.1109/DCC.1997.582093

Conference Location: Snowbird, UT, USA

Symbol ranking text compressors

Abstract:

Metadata

Abstract:

IEEE Account

Purchase Details

Profile Information

Need Help?

Symbol ranking text compressors

Alerts

Abstract:

Metadata

Abstract:

Authors

Citations

Keywords

Metrics

IEEE Account

Purchase Details

Profile Information

Need Help?