Data compression using long common strings
Bentley, J.
McIlroy, D.
AT&T Bell Labs., Murray Hill, NJ;
This paper appears in: Data Compression Conference, 1999. Proceedings. DCC '99
Publication Date: 29-31 Mar 1999
On page(s): 287-295
Meeting Date: 03/29/1999 - 03/31/1999
Location: Snowbird, UT, USA
ISBN: 0-7695-0096-X
References Cited: 13
INSPEC Accession Number: 6314283
Digital Object Identifier: 10.1109/DCC.1999.755678
Current Version Published: 2002-08-06
Abstract
We describe a precompression algorithm that effectively represents
any long common strings that appear in a file. The algorithm interacts
well with standard compression algorithms that represent shorter strings
that are near in the input text. Our experiments show that some real
data sets do indeed contain many long common strings. We extend the
fingerprint mechanisms of our algorithm to a program that identifies
long common strings in an input file. This program gives interesting
insights into the structure of real data files that contain long common
strings
Index
Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.