Skip to Main Content
Grammar-based compression is a well-studied technique for constructing a small context-free grammar (CFG) uniquely deriving a given text. In this paper, we present an online algorithm for lightweight grammar-based compression. Our algorithm is based on the LCA algorithm [Sakamoto et al. 2004] which guarantees nearly optimum compression ratio and space. LCA, however, is an offline algorithm and requires external space to save space consumption. Therefore, we present its online version which inherits most characteristics of the original LCA. Our algorithm guarantees O(log2 n)-approximation ratio for an optimum grammar size, and all work is carried out on a main memory space which is bounded by the output size. In addition, we propose more practical encoding based on parentheses representation of a binary tree. Experimental results for repetitive texts demonstrate that our algorithm achieves effective compression compared to other practical compressors and the space consumption of our algorithm is smaller than the input text size.
Date of Conference: 21-24 June 2011