Cumulant Generating Function of Codeword Lengths in Variable-Length Lossy Compression Allowing Positive Excess Distortion Probability

This paper considers the problem of variable-length lossy source coding. The performance criteria are the excess distortion probability and the cumulant generating function of codeword lengths. We derive a non-asymptotic fundamental limit of the cumulant generating function of codeword lengths allowing positive excess distortion probability. It is shown that the achievability and converse bounds are characterized by the Rényi entropy-based quantity. In the proof of the achievability result, the explicit code construction is provided. Further, we investigate an asymptotic single-letter characterization of the fundamental limit for a stationary memoryless source. A full version of this paper is accessible at: http://arxiv.org/abs/1801.02496


I. INTRODUCTION
The problem of variable-length source coding is one of the fundamental research topics in Shannon theory. For this problem, one of the criteria is the normalized cumulant generating function of codeword lengths. This criterion was first proposed by Campbell [1] as a proxy for the mean codeword length.
Several previous works investigated the fundamental limit of the normalized cumulant generating function of codeword lengths: e.g., [1] and [2] for the problem of variable-length lossless source coding; [9] for the problem of variable-length source coding allowing errors; [3] for the problem of variablelength lossy source coding.
The most relevant study to this paper is the work by Courtade and Verdú [3]. As described above, they considered the problem of variable-length lossy source coding. As a criterion of the distortion measure, they treated the excess distortion probability. Their object of study was the code whose excess distortion probability is zero at a given distortion level D. By using the D-tilted Rényi entropy, the study [3] derived the converse bound for the fundamental limit of the normalized cumulant generating function of codeword lengths. This paper considers the problem of variable-length lossy source coding and treats the same criteria as in [3]. However, the primary differences are 1) we evaluate the code whose excess distortion probability may be positive, and 2) we derive both achievability and converse bounds by using a novel Rényi entropy-based quantity. To show the achievability results, we give an explicit code construction instead of using the random coding argument.
Section II formulates the problem setup. Section III describes the related work by Courtade and Verdú [3]. Sections IV and V show the main results in this paper. In Section IV, we first define a Rényi entropy-based quantity. Then, using this quantity, we show non-asymptotic upper and lower bounds of the fundamental limit. Section V investigates an asymptotic single-letter characterization of the fundamental limit for a stationary memoryless source. Proofs of main results are in Section VI. Section VII discusses the obtained results.

II. PROBLEM FORMULATION
Let X be a source alphabet and Y be a reproduction alphabet, where both are finite sets. Let X be a random variable taking a value in X and x be a realization of X. The probability distribution of X is denoted as P The pair of an encoder and a decoder (f, g) is defined as follows. An encoder f is defined as f : X → {0, 1} ⋆ , where {0, 1} ⋆ denotes the set of all finite-length binary strings and the empty string λ, i.e., {0, 1} ⋆ = {λ, 0, 1, 00, . . .}. An encoder f is possibly stochastic and produces a nonprefix code. For x ∈ X , the codeword length of f (x) is denoted as ℓ(f (x)). A deterministic decoder g is defined as g : {0, 1} ⋆ → Y. Variable-length lossy source coding without the prefix condition is discussed as in, for example, [3] and [8]. Once we prove a result for a non-prefix code, we can easily derive a result for a prefix code. We shall discuss it in Section VII.
For a code (f, g), we define the excess distortion probability and the normalized cumulant generating function of codeword lengths.
Definition 2: Given t > 0, the normalized cumulant generating function of codeword lengths is defined as 1 Remark 1: The l'Hôspital theorem yields Thus, the normalized cumulant generating function of codeword lengths contains the mean codeword length and the maximum codeword length as its special cases. Using these criteria, we define a (D, R, ǫ, t) code. Definition 3: Given D, R ≥ 0, ǫ ∈ [0, 1), and t > 0, a code (f, g) satisfying is called a (D, R, ǫ, t) code. The fundamental limit that we investigate is When we work on the setup of blocklength n, we formulate the problem as follows. Let X n and Y n be the n-th Cartesian product of X and Y, respectively. Let X n be a random variable taking a value in X n and x n be a realization of X n . The probability distribution of X n is denoted as P X n . A distortion measure d n is defined as d n : X n × Y n → [0, +∞). An encoder f n : X n → {0, 1} ⋆ is possibly stochastic and produces a non-prefix code. A decoder g n : {0, 1} ⋆ → Y n is deterministic.
Let R(D) be the rate-distortion function, i.e., where I(X; Y ) denotes the mutual information between random variables X and Y , and P Y |X denotes a conditional probability distribution of Y given X. Assume that the minimum in the rate-distortion function R(D) is achieved by P ⋆ Y |X . Further, let Y ⋆ be a random variable taking a value in Y and whose distribution P Y ⋆ is the marginal of P ⋆ Y |X P X . Then, the D-tilted information of x ∈ X is defined as 2 where the expectation is with respect to P Y ⋆ and λ ⋆ := −R ′ (D). Further, the D-tilted Rényi entropy of order α ∈ (0, 1) ∪ (1, ∞) is defined as [3] H α (X, The next theorem characterizes the converse bound on R * (D, 0, t) by the D-tilted Rényi entropy.
Theorem 1 ( [3]): For any D ≥ 0 and t > 0, where |X | and |Y| represent the cardinality of X and Y, respectively. Remark 2: The previous study [3] investigated the case where the excess distortion probability is zero (i.e., ǫ = 0 in (4)). Further, they only showed the converse result. On the other hand, our study deals with positive excess distortion probability as in (4). Moreover, our study investigates both achievability and converse bounds.

A. Preliminary: Rényi Entropy-Based Quantity
For α ∈ (0, 1)∪(1, ∞), the Rényi entropy is defined as [12] H α (X) = 1 One of the useful properties of the Rényi entropy is Schur concavity. This property is used in the proof of the achievability result in our main theorem. To state the definition of a Schur concave function, we first review the notion of majorization.
Next, we introduce a new quantity based on the Rényi entropy. This quantity plays an important role in producing our main results.
Remark 3: For a given D ≥ 0 and ǫ ∈ [0, 1), suppose that Then, there are no codes whose excess distortion probability is less than or equal to ǫ. Conversely, if such codes do not exist for given D and ǫ, (16) holds. In this case, we define

B. Non-Asymptotic Coding Theorem
The next lemma shows the achievability result on R of a (D, R, ǫ, t) code.
Lemma 1: For any D ≥ 0, ǫ ∈ [0, 1), and t > 0, there exists a (D, R, ǫ, t) code such that Proof: See Section VI-A. Remark 4: The random coding argument is not used to prove the achievability result. Instead, an explicit code construction is given. This is similar to Feinstein's cookie-cutting argument [4].
The next lemma shows the converse bound on R of a (D, R, ǫ, t) code.

V. ASYMPTOTIC ANALYSIS FOR A STATIONARY MEMORYLESS SOURCE
This section investigates the general formula (20) when a stationary memoryless source is assumed. Especially, we consider the special case t ↓ 0 and drive a single-letter characterization of the fundamental limit R * (n, D, ǫ, 0) := lim t↓0 R * (n, D, ǫ, t).

Theorem 4 ( [8]):
We impose the next assumptions: where the expectation is with respect to P X × P Y ⋆ . Under a stationary memoryless source and the assumptions 1) -4), we have, for any ǫ ∈ [0, 1), where V (D) is the rate-dispersion function [7] which is defined as the variance of the D-tilted information, i.e., Proof: See Section VI-C. Remark 5: In view of Remark 1, we observe that R * (n, D, ǫ, 0) represents the fundamental limit of the mean codeword length. This quantity was investigated by [8], and our result (25) coincides with the result in [8].

A. Proof of Lemma 1
First, some notations are defined before showing the construction of the encoder and the decoder.
Proof: First, we show the left inequality of (41). The construction of the code gives for any i ∈ {1, 2, . . . , k * }. This inequality yields which is the left inequality of (41). Next, we show the right inequality of (41). The code construction gives the next inequality on the distribution of Y : Thus, for any i ∈ {1, 2, . . . , k * }, it follows that 3 Note that we have P[X ∈ A D (y k * )] ≥ β from (35). 4 From the construction ofĝ, we can define its inverse function.
Hence, for any i ∈ {1, 2, . . . , k * }, we have where (a) follows from (45) and (b) is due to The inequality (48) yields the right inequality of (41). Using Lemma 3, we have (52) Thus, taking logarithm of both sides of (52) and dividing by t > 0, we have Finally, we evaluate the left and right-hand sides of (54). The left-hand side of (54) is evaluated as Indeed, this is verified as follows: On the other hand, the right-hand side of (54) is evaluated as This is proved by combining the fact that the Rényi entropy is a Schur concave function and the next lemma shown in [13]. Lemma 4 ( [13]): The distribution PŶ majorizes any PỸ induced by PỸ |X satisfying P[d(X,Ỹ ) > D] ≤ ǫ.

B. Proof of Lemma 2
Fix a (D, R, ǫ, t) code (f, g) arbitrarily and we denote by Y := g(f (X)). Further, without loss of generality, we assume that the decoder g is an injective mapping 5 . Then, the definition of a (D, R, ǫ, t) code gives and the assumption that g is an injective mapping yields the next inequality [2]: where The key lemma in the proof of the converse result is as follows.
Lemma 5: For any t > 0, we have Then, Hölder's inequality gives Taking logarithm of both sides of (69) and substituting (67) and (68) for (69), we obtain 5 Note that it is sufficient to consider the case where the decoder g is an injective mapping in the proof of the converse part (see, e.g., [3]).

C. Proof of Theorem 5
We denote by G D,ǫ 1 (X n ) := lim α↑1 G D,ǫ α (X n ) andŶ n := g n (f n (X n )), where (f n ,ĝ n ) is the code as constructed in the proof of Lemma 1. Then, we have where (a) follows from (61), (b) is due to the fact that the Rényi entropy approaches the Shannon entropy as α tends to 1, and (c) follows from Lemma 4 and the fact that the Shannon entropy is a Schur concave function (e.g., [10]). Further, the definition of H D,ǫ (X n ) gives min P Y n |X n : Combination of (74) and (75) yields On the other hand, we have where (a) is due to the non-negativity of the conditional Shannon entropy and (b) follows from (74). Thus, combination of (76) and (78) and application of Theorem 4 establish Finally, letting t ↓ 0 in Theorem 3, using (79), and noticing 1 n log log(1 + min{|X n |, |Y n |}) = O log n n , we obtain the desired result (25).

A. Theorem for a Deterministic Code
So far, we have treated a stochastic code. If we deal with only a deterministic code, we have the next lemma instead of Lemma 1.
Proof: See Appendix A. Comparing Lemmas 1 and 6, we observe that the result for the deterministic code is weaker than that of the stochastic code. In the asymptotic regime, however, the restriction to only deterministic code is negligible since holds as n → ∞.

B. Theorem for a Prefix Code
We have discussed a code without the prefix constraints. In this section, we discuss a result for an encoder f p : X → {0, 1} ⋆ and a decoder g p : {0, 1} ⋆ → Y when we assume that f p produces a prefix code.
As shown in (6), we have defined R * (D, ǫ, t) for a nonprefix code. Similarly, we define R * p (D, ǫ, t) as the fundamental limit on the normalized cumulant generating function of codeword lengths for a prefix code (f p , g p ). Then, a modification of the proof of Lemmas 1 and 2 yields the next result.
Proof: See Appendix B.

C. Non-Asymptotics and Distortion Balls
In our non-asymptotic analysis, the distortion D-ball around y (i.e., (26)) plays a crucial role. On the other hand, in the previous studies of non-asymptotics for lossy compression [3], [5], [6], [7], [8], the distortion D-ball around x (i.e.,B D (x) := {y ∈ Y : d(x, y) ≤ D}) plays an important role. Investigating the relation of approaches between previous works and our work is one of the future works.
First, we evaluate the excess distortion probability. From the definition of the encoder and the decoder, Therefore, we have Next, we evaluate the normalized cumulant generating function of codeword lengths for the code (f det ,ĝ det ). To this end, we denote byŶ det :=ĝ det (f det (X)). For any t > 0, we have where (a) follows from the same discussion as in (53), (b) is due to the construction of (f ,ĝ) and (f det ,ĝ det ), (c) and (d) follow from Taylor's expansion, and (e) is due to (61). Thus, we complete the proof of Lemma 6.
To show Theorem 6, we prove the next two lemmas. If we prove these lemmas, we can immediately obtain Theorem 6.
[Decoder] Setĝ p (w i ) = y i (i = 1, . . . , k * ). Now, we evaluate the excess distortion probability of the code (f p ,ĝ p ). The same discussion as in the proof of Lemma 1 yields P[d(X,ĝ p (f p (X))) > D] = ǫ. (104) Next, we evaluate the normalized cumulant generating function of codeword lengths for the code (f p ,ĝ p ): where (a) follows from the construction of (f ,ĝ) in Section VI-A and that of (f p ,ĝ p ), and (b) is due to (62). This completes the proof of Lemma 7.

Proof of Lemma 8:
By replacing (65) with Kraft's inequality and following the same route as in the proof of Lemma 2, we obtain Lemma 8.