On Wednesday, July 29th, IEEE Xplore will undergo scheduled maintenance from 7:00-9:00 AM ET (11:00-13:00 UTC). During this time there may be intermittent impact on performance. We apologize for any inconvenience.
The goal of this contribution is twofold: (i) to introduce a generalized Lempel-Ziv parsing scheme, and (ii) to analyze second-order properties of some compression schemes based on the above parsing scheme. We consider a generalized Lempel-Ziv parsing scheme that partitions a sequence of length n into variable phrases (blocks) such that a new block is the longest substring seen in the past by at most b-1 phrases. The case b=1 corresponds to the original Lempel-Ziv scheme. In this paper, we investigate the size of a randomly selected phrase, and the average number of phrases of a given size through analyzing the so called b-digital search tree (b-DST) representation. For a memoryless source, we prove that the size of a typical phrase is asymptotically normally distributed. This result is new even for b=1, and b>1 is a non-trivial extension