Skip to Main Content
Systems able to cope with very large text collections are making intensive use of distributed memory parallel computing platforms such as clusters of PCs. This is particularly evident in Web search engines which must resort to parallelism in order to deal efficiently with both high rates of queries per unit time and high space requirements in the form of large numbers of small documents stored in secondary memory. Those documents can be stored in compressed format to reduce memory space and communication time. This paper proposes a parallel algorithm for compressing text in such a distributed memory environment. We show efficient performance against the usual-practice alternative of compressing the whole text on a single machine.