Skip to Main Content
Variable-to-variable (VV) codes are very attractive yet not well understood data compression schemes. In 1972, Khodak claimed to provide upper and lower bounds for the achievable redundancy rate, however, he did not offer explicit construction of such codes. In this paper, we first present a constructive and transparent proof of Khodak's result showing that for memoryless sources there exists a code with the average redundancy bounded by D -5/3, where D is the average delay (e.g., the average length of a dictionary entry). We also describe an algorithm that constructs a VV length code with a small redundancy rate for large D. Then, we discuss several generalizations. We prove that the worst case redundancy does not exceed D -4/3. Furthermore, we provide similar upper bound for Markov sources (of order 1). Finally, we consider bounds that are valid for almost all memoryless and Markov sources for which the set of exceptional source parameters has zero measure. In particular, for all memoryless sources outside this exceptional class, we prove there exists a VV code with the average redundancy rate bounded by D -1-m/3+ epsiv and the worst case redundancy rate bounded by D -1-m/3+ epsiv, where m is the cardinality of the alphabet. We complete our analysis with a lower bound showing that for all VV codes the average and the worst case redundancy rates are at least D -2m-1- epsiv for almost all memoryless sources in the sense that the set of exceptional source parameters has zero measure. We prove these results using techniques of Diophantine approximations.