Skip to Main Content
A classic idea to improve text compression is to replace words with references to a text dictionary, either external or stored together with the archive. We advocate for the second option, as even with one language in mind (e.g., English) it is rather impossible to have a single dictionary fitting well different sorts of modern texts. There are basically two problems to solve, which are how to assign codewords to individual words from the parsed text, and how to represent the dictionary compactly. The resulting data are input for a backend compressor. Since in many scenarios texts are decompressed (read) more often than compressed (written), we focus on LZ77 backend compression algorithms, in particular Deflate, used in zip/gzip standards, whose well-known asset is very fast decompression.