Overflow Probability of Variable-length Codes with Codeword Cost

Lossless variable-length source coding with codeword cost is considered for general sources. The problem setting, where we impose on unequal costs on code symbols, is called the variable-length coding with codeword cost. In this problem, the infimum of average codeword cost have been determined for general sources. On the other hand, overflow probability, which is defined as the probability of codeword cost being above a threshold, have not been considered yet. In this paper, we determine the infimum of achievable threshold in the first-order sense and the second-order sense for general sources and compute it for some special sources such as i.i.d. sources and mixed sources. A relationship between the overflow probability of variable-length coding and the error probability of fixed-length coding is also revealed. Our analysis is based on the information-spectrum methods.


I. Introduction
Lossless variable-length coding problem is quite important not only from the theoretical viewpoint but also from the viewpoint of its practical applications. To evaluate the performance of variable-length codes, several criteria have been proposed. The most fundamental criterion is the average codeword length, which is proposed by Shannon [2]. And then, many variable-length R. Nomura is with the School of Network and Information, Senshu University, Kanagawa, Japan, e-mail: nomu@isc.senshu-u.ac.jp. The material in this paper was presented in part [1] at the 2012 IEEE International Symposium on Information Theory, Boston, USA, July 2012.
The first author with this work was supported in part by JSPS Grant-in-Aid for Young Scientists (B) No. 23760346. [15] has established the second-order source coding theorem on the codeword length for i.i.d.
sources and Markov sources. In the channel coding problem, Strassen [16] (see, Csiszär and Körner [8]), Hayashi [17], and Polyanskiy, Poor and Verdú [18] have determined the second-order capacity rate. Hayashi [19] has also shown the second-order achievability theorems for the fixed-length source coding problem for general sources and compute the optimal second-order achievable rates for i.i.d.
sources by using the asymptotic normality. Nomura and Han [20] have also computed the optimal second-order achievable rates in fixed-length source coding for mixed sources by using the two-peak asymptotic normality.
Analogously to these settings, we define the second-order achievable threshold on the overflow probability and derive the infimum of the second-order achievable threshold. Notice here that Nomura and Matsushima [6] have already considered the first-order and the second-order achievability with respect to the overflow probability of codeword length. One of contributions of this paper is a generalization of their results into the case of codeword cost. Our analysis is based on the information-spectrum methods and hence our results are valid for general sources. Furthermore, we apply our results to i.i.d. sources and mixed sources as special cases and compute the infimum of the second-order achievable threshold for these special but important sources.
Related works include works by Kontoyiannis and Verdú [21], and Kosut and Sankar [22]. They have also considered the similar quantity with the overflow probability. Kontoyiannis and Verdú [21] have derived the fundamental limit of this quantity without the prefix conditions. Kosut and Sankar [22] have also derived the upper-bound of the overflow probability in universal setting. It should be emphasized that they have considered the overflow probability of codeword length for some special sources and derived bounds up to the third-order. On the other hand, in this paper we have considered the overflow probability of codeword cost for general sources and addressed the fundamental limit of the achievable threshold up to the second-order. This paper is organized as follows. In Section II, we state the problem settings and define the achievability treated in this paper. In Section III, we reveal the relationship between the overflow probability of variable-length coding and the error probability of fixed-length coding. In Section IV, we prove two lemmas which play the key role in the subsequent analysis. In Section V, we determine the infimum of first-order achievable threshold. In Section VI, we derive the infimum of second-order achievable threshold and compute it for some special sources. In Section VII, we conclude our results.

A. Variable-length codes with codeword cost for general source
The general source is defined as an infinite sequence takes values in a countable set X . It should be noted that each component of X n may change depending on block length n. This implies that even consistency condition, which means that for any integers holds, may not hold.
Variable-length codes are characterized as follows. Let be a variable-length encoder and a decoder, respectively, where U = {1, 2, · · · , K} is called the code alphabet and U * is the set of all finite-length strings over U excluding the null string.
We consider the situation that there are unequal costs on code symbols. Let us define the cost function over U considered in this paper. Each code symbol u ∈ U is assigned the corresponding cost c(u) such that 0 < c(u) < ∞, and the additive cost c(u) of u = u 1 , u 2 , · · · u k ∈ U k is defined by In particular, we denote c max = max u∈U c(u) for short. This cost function is called memoryless cost function. A generalization of this cost function is discussed in Section VII.
We only consider variable-length codes satisfying prefix condition. It should be noted that every variable-length code with prefix condition over unequal costs, satisfies where α c is called cost capacity and defined as the positive unique root α of the equation [8]: Throughout this paper, the logarithm is taken to the base K.

B. Overflow Probability of Codeword Cost
The overflow probability of codeword length is defined as follows: Definition 2.1: [4] Given a threshold R, the overflow probability of variable-length encoder ϕ n is defined by where l() denotes the length function.
In this paper, we generalize the above overflow probability not only to the case for unequal costs on code symbols but also for finer evaluation of the overflow probability. To this end, we consider the overflow probability of codeword cost as follows:

Definition 2.2 (Overflow Probability of Codeword Cost):
Given some sequence {η n } ∞ n=1 , where 0 < η n < ∞ for each n = 1, 2, · · · , the overflow probability of variable-length encoder ϕ n is defined by ε n (ϕ n , η n ) = Pr {c(ϕ(X n )) > η n } . (4) Remark 2.1: Nomura and Matsushima [6] have considered the overflow probability with respect to the codeword length, that is, Pr {l(ϕ(X n )) > η n } and derived the achievability of the first-order and the second-order sense. Kosut and Sankar [22] have also defined the similar probability in the case of codeword length and derived the upper bound in universal setting.
Since {η n } ∞ n=1 is an arbitrary sequence, the above definition is general. In particular, we shall consider the following two types of overflow probability in this paper:
Thus, in the case that η n = nR, the overflow probability defined by (4) means the probability that the codeword cost per symbol exceeds some constant R. This is a natural extension of the overflow probability of codeword length to the overflow probability of codeword cost defined by (3).
On the other hand, in the analysis of fixed-length coding problem, Hayashi [19] has shown the second-order asymptotics, which enables us a finer evaluation of achievable rate. A coding theorem from the view point of the second-order asymptotics have been also analyzed by Kontoyiannis [15].
Analogously to their results, we evaluate the overflow probability in the second-order sense. To do so, we consider the second case: η n = na + L √ n for all n = 1, 2, · · · . Hereafter, if we consider the overflow probability in the case η n = na + L √ n, we call it the second-order overflow probability given a in this paper, while in the first case it is called the first-order overflow probability. The second-order overflow probability given a of variable-length encoder ϕ n with threshold L is written as ε n ϕ n , na It should be noted that since we assume that η n satisfies 0 < η n < ∞, 0 < R < ∞ must hold, while L can be negative number.
Throughout in this paper, we are interesting in the following achievability: is called a sequence of εachievable overflow thresholds for the source if there exist a variable-length encoder ϕ n such that lim sup n→∞ ε n (ϕ n , η n ) ≤ ε.
III. Relationship between the overflow probability of variable-length coding and the error probability of fixed-length coding Uchida and Han [5], and Nomura, Matsushima and Hirasawa [23] have derived the relationship between the overflow probability of variable-length coding and the error probability of fixed-length coding in the meaning of codeword length. Analogously, we first reveal a deep relationship between the variable-length coding with codeword cost and the fixed-length coding.
Let ϕ f n : X n → U Mn , ψ f n : U Mn → X n be a fixed-length encoder and a decoder, respectively, for source X = {X n } ∞ n=1 , where U Mn def = {1, 2, · · · , M n } denotes a code set of size M n . The decoding error probability ε n is given by ε f n def = Pr X n = ψ f n ϕ f n (X n ) . Such a code is denoted by n, M n , ε f n . We then define the ε-achievability in the fixed-length codes that is analogous to Definition 2.3. The following theorem reveals the relationship between the overflow probability of variable-length coding and the error probability of fixed-length coding.

Theorem 3.1 (Equivalence Theorem):
1) Assuming that {η n } ∞ n=1 is a sequence of ε-achievable fixed-length, then 1 αc η n + c ∞ n=1 is a sequence of ε-achievable overflow thresholds, where c denotes the constant term which depends on the cost function and the source.
2) Assuming that {η n } ∞ n=1 is a sequence of ε-achievable overflow thresholds, then {α c η n } ∞ n=1 is a sequence of ε-achievable fixed-length. Proof: The proof consists of two parts. 1) We first prove the first statement. Suppose that {η n } ∞ n=1 is a sequence of ε-achievable fixedlength, then there exists an n, M n , ε f n code such that By using this n, M n , ε f n code, we define the set T n as where ψ f * n , ϕ f * denotes the pair of coding and decoding function satisfying (5) and (6). By using this set T n , we construct the variable-length encoder ϕ n as follows: n : T n → U * denotes the encoding function proposed by Han and Uchida [14], for random variable Z n distributed in uniformly on T n and ϕ (2) n : X n \ T n → U * is an arbitrary variable-length encoder. Notice here that from the property of the code, it holds that for all n = 1, 2, · · · . Now, we evaluate the overflow probability of this code. From (11), we have for On the other hand, from (6) we have, for ∀δ > 0 log M n < η n + δ, for sufficiently large n. Thus, from the construction of this variable-length code, for ∀x ∈ T n , the codeword cost is upper bounded by for sufficiently large n.
Setting c = log 2 αc + 2c max , this means that for sufficiently large n.
Here, from (5) and the definition of T n , we have Noting that δ is arbitrarily small, this means that the first statement holds.
By using this variable-length code we define a set T n as where ϕ * denotes the variable-length encoder satisfying (8).
Here, from (1) it holds that Thus, we have Now, we define the fixed-length encoder with M n = |T n | as and the decoder ψ f n (i) : U Mn → T n as the mapping such that ψ f x ∈ T n . Then, from (9) and the fact that On the other hand since {η n } ∞ n=1 is a sequence of ε-achievable overflow thresholds, we have Here, noting that the error probability of this fixed-length code is given by ε f n = Pr {x ∈ T c n }, the second statement has been proved.
The definition of the ε-achievability for fixed-length codes (Definition 3.1) is very general and it includes the ordinary first-order and the second-order achievability. Hence from Theorem 3.1 and the previous results for fixed-length codes such as [19] and [24], we can obtain analogous results for the overflow probability of variable-length codes (see, Remark 5.1 in Section V and Theorem 6.3 in Section VI for example). In the following section, however, we derive several theorems from another information-spectrum approach in order to see the logic underlying the whole process of variable-length codes with codeword cost.

IV. First-order and Second-order achievability
Hereafter, we consider the first-order and the second-order achievable threshold as described in Remark 2.2. In the first-order case, we are interested in the infimum of threshold R that we can achieve. This is formalized as follows.
Also, in the analysis of second-order overflow probability, we define the achievability: As described in the previous section, we demonstrate the infimum of ε-achievable overflow threshold and (ε, a)-achievable overflow threshold not via Theorem 3.1 but via another informationspectrum approach. To do so, we show two lemmas that have important roles to derive theorems.
Proof: We use the code proposed by Han and Uchida [14]. Then, from the property of the code, it holds that for all n = 1, 2, · · · , where ϕ * n denotes the encoder of the code. Furthermore, we set the decoder as the inverse mapping of ϕ * n that is, ψ n = ϕ * −1 n . Please note that the code is a uniquely decodable variable-length code for general sources with countably infinite source alphabet.
Next, we shall evaluate the overflow probability of this code. Set The overflow probability is given by where A c denotes the complement set of the set A.
Since (11) holds, for ∀x ∈ S n , we have Thus, we have for ∀x ∈ S n . Substituting the above inequality into (12), we have Here, from the definition of A n , for ∀x ∈ A c n , it holds that Thus, we have This means that Substituting (14) into (13), we have since log 2 ≤ 1. Therefore, we have proved the lemma.
Proof: Let ϕ n be an encoder and decoder of variable-length code. Set Then, by using S n defined by (15) we have On the other hand, for ∀x ∈ B n it holds that Thus, we have Here, from (1), we have This mean that Hence, substituting (18) into (17), we have Pr P X n (X n ) ≤ a n K −ηn ≤ ε n (ϕ n , η n ) + |B n ∩ S c n | z n K −αcηn ≤ ε n (ϕ n , η n ) + K αcηn z n K −αcηn = ε n (ϕ n , η n ) + z n .
Therefore, we have proved the lemma.

A. General Formula for the Infimum of ε-achievable Overflow Threshold
In this subsection, we determine R(ε|X) for general sources. Before showing the theorem, we define the function F (R) as follows: The following theorem is one of our main results: Proof: The proof consists of two parts.
Noting that γ > 0 is arbitrarily small, the direct part has been proved.

(Converse Part)
Assuming that R 1 satisfying is an ε-achievable overflow threshold, then we shall show a contradiction.
Let η n be as η n = nR 1 . Then from Lemma 4.2 for any sequence {z n } ∞ n=1 ( z i > 0 i = 1, 2, · · · ) and any variable-length code we have ε n (ϕ n , nR 1 ) > Pr P X n (X n ) ≤ z n K −nαcR1 − z n , for n = 1, 2, · · · . Thus, we have for any variable-length code Set z n = K −nγ , where γ > 0 is a small constant that satisfies Since we assume that (20) holds, it is obvious that there exists γ > 0 that satisfies the above inequality. Then, we have ε n (ϕ n , nR 1 ) > Pr 1 nα c log 1 P X n (X n ) Hence, we have lim sup n→∞ ε n (ϕ n , nR 1 ) ≥ lim sup where the last inequality is derived from (21) and the definition of F (R).
On the other hand, since we assume that R 1 is an ε-achievable overflow threshold, it holds that lim sup n→∞ ε n (ϕ n , nR 1 ) ≤ ε.
This is a contradiction. Therefore the proof of converse part has been completed.
Then, the following corollary holds.

B. Strong Converse Property
Strong converse property is one of important properties for the source in fixed-length source coding problem [24]. When we consider the second-order achievability, we give appropriate firstorder term. In many cases the first-order term is determined by considering the strong converse property and hence the strong converse property has an important meaning in the analysis of second-order achievability.
Analogously to the fixed-length coding problem, we can consider the strong converse property in the meaning of the overflow probability in variable-length codes. In this subsection, we establish the strong converse theorem on the overflow probability of variable-length coding with codeword cost. Let us begin with the definition of strong converse property treated in this paper.
Definition 5.1: Source X is said to satisfy the strong converse property, if any variable-length code (ϕ n , ψ n ) with the overflow probability ε n (ϕ n , nR), where R is an arbitrary rate satisfying R < R(0|X), necessarily yields lim n→∞ ε n (ϕ n , nR) = 1.
In order to state the strong converse theorem, we define the dual quantity of H(X) as which is called the spectral inf-entropy rate [24] 2 . Then, we have the following theorem on the strong converse property. The theorem reveals that the strong converse property only depends on source X and is independent on cost function.

Remark 5.2:
For an i.i.d. source, the following relationship holds [24], where H(X) denotes the entropy of the source. Thus, any i.i.d. source satisfies the strong converse property. This means that the infimum of ε-achievable overflow threshold R(ε|X) is constant and is independent on ε.

A. General formula for the infimum of (ε, a)-achievable overflow threshold
So far, we have considered the first-order achievable threshold. In this section, we consider the second-order achievability. In the second-order case, the infimum (ε, a)-achievable overflow threshold for general sources is also determined by using Lemma 4.1 and Lemma 4.2.
We define the function F a (R) given a as follows, which is correspondence with the function F (R) in first-order case.
Then, we have Theorem 6.1: For 0 ≤ ∀ε < 1, it holds that Proof: The proof is similar to the proof of Theorem 5.1.
Thus, we have Let z n be as z n = K − 4 √ nγ , then we have for sufficiently large n, because γ 2 > γ 4 √ nαc holds for sufficiently large n.
This means that (24) holds. Therefore, the direct part has been proved.

(Converse Part)
Assuming that L 1 satisfying is an (ε, a)-achievable second order overflow threshold, we shall show a contradiction.
From Lemma 4.2 for any sequence {z n } ∞ n=1 (z i > 0, i = 1, 2, · · · ) and any variable-length encoder, it holds that for n = 1, 2, · · · . Thus, for any variable-length encoder, we have where γ > 0 is a small constant that satisfies Here, since we assume (25), it is obvious that there exists γ > 0 satisfying the above inequality.
Then, we have This implies that lim sup n→∞ ε n ϕ n , na where the last inequality is derived from (26) and the definition of F a (L).
On the other hand, since we assume that L 1 is (ε, a)-achievable overflow threshold, it holds that lim sup n→∞ ε n ϕ n , na This is a contradiction. Therefore, the proof of converse part has been completed.

B. Computation for i.i.d. Sources
Theorem 6.1 is a quite general result, because there is no restriction about the probability structure for the source. However, to compute the function L (ε, a|X) is hard in general. Next, we consider a simple case such as an i.i.d. source with countably infinite alphabet and we address the above quantity explicitly.
For an i.i.d. source, from Remark 5.2, we are interested in L ε, 1 αc H(X) X . To specify this quantity for an i.i.d. source, we need to introduce the variance of self-information as follows: where H(X) is the entropy of the i.i.d. source defined by .
Here, we assume that the above variance exists. Then, from Theorem 6.1 we obtain the following theorem.
Theorem 6.2: For any i.i.d. source, it holds that where Φ −1 denotes a inverse function of Φ and Φ(T ) is the Gaussian cumulative distribution function with mean 0 and variance 1, that is, Φ(T ) is given by Proof: From the definition of F a (L), we have On the other hand, since we consider the i.i.d. source, from the asymptotic normality (due to the central limit theorem) it holds that

This means that
Thus, L ε, 1 αc H(X) X is given by is a continuous function and monotonically increases as L increases, we have Therefore, the proof has been completed. Remark 6.1: As shown in the proof, the derivation of Theorem 6.2 is based on of the asymptotic normality of self-information. This means that the similar argument is valid for any source for which the asymptotic normality of self-information holds such as Markov sources (see, Hayashi [19]).

C. Computation for Mixed Sources
In this subsection we consider mixed sources. The class of mixed sources is very important, because all of stationary sources can be regarded as forming mixed sources obtained by mixing stationary ergodic sources with respect to appropriate probability measures. Notice here that, in general, the mixed source does not have the asymptotic normality of self-information. So, we can not simply apply Theorem 6.2.
The second-order achievable rates for mixed sources has been first considered by Nomura and Han [20] in the fixed-length source coding problem. In this paper, we also use the similar approach.
The result in this subsection is analogous to the result in [20].
We consider a mixed source consists of two stationary memoryless sources X i = {X n i } ∞ n=1 with i = 1, 2. Then, the mixed source X = {X n } ∞ n=1 is defined by where w(i) are constants satisfying w(1) + w(2) = 1 and w(i) > 0 (i = 1, 2). Since two i.i.d. sources X i (i = 1, 2) are completely specified by giving just the first component X i (i = 1, 2), we may write simply as X i = {X i } (i = 1, 2) and define the variances: where we assume that these variances are exist, and define the entropy by .
Before showing second-order analysis we shall consider the first-order case. Without loss of generality, we assume that H(X 1 ) ≥ H(X 2 ) holds.
In the sequel, we consider the case that 0 ≤ ε < 1 and w(1) = ε hold, because if w(1) = ε holds the second-order achievable overflow threshold is trivial (cf. [20,Remark 5.2]). Then, given ε we classify the problem into three cases. We also assume that H(X 1 ) ≥ H(X 2 ) holds without loss of generality: I H(X 1 ) = H(X 2 ) holds.
In Case I, we shall compute L ε, 1 αc H(X 1 ) X (this is equal to L ε, 1 αc H(X 2 ) X ). In Case II and Case III we shall show L ε, 1 αc H(X 1 ) X and L ε, 1 αc H(X 2 ) X , respectively. Then, from Theorem 6.1 we obtain the following theorem: Theorem 6.4: For any mixed source, it holds that where T 1 is specified by where T 2 is specified by where T 3 is specified by Proof: This theorem can be shown substantially same with [20, Theorem 5.1]. We only show the proof of Case I in Appendix.

Remark 6.2:
In [20], the countably infinite mixture of i.i.d. sources and general mixture of i.i.d.
sources are treated. We can also obtain the infimum of (ε, a)-achievable overflow threshold in these cases by using the similar argument.

VII. Concluding Remarks
We have so far dealt with the overflow probability of variable-length coding with codeword cost.
The overflow probability is important not only from the theoretical viewpoint but also from the engineering point of view. As shown in the proofs of the present paper, the information-spectrum approach is substantial in the analysis of the overflow probability of variable-length coding.
In particular, Lemma 4.1 and Lemma 4.2 are key lemmas. The infimum of first-order achievable threshold and the infimum of second-order achievable threshold have been derived from these lemmas. Theorem 3.1 is also useful, because it enables us to apply results derived in the fixedlength coding problem, into the variable-length coding problem.
Finally, we shall note a generalization of the cost function. Although we only consider the memoryless cost function, all the results in this paper are valid for wide class of cost function as follows. Let us define the cost function c : U * → (0, +∞) considered in this paper. The cost c(u l ) of a sequence u l ∈ U l is defined by Furthermore, we assume that the conditional cost capacity α c (u i−1 1 ) is independent on u i−1 1 , more exactly, α c (u i−1 1 ) = α holds for all u i−1 1 ∈ U i−1 . Such a class of cost function has been first considered in [26]. Han and Uchida [14] also have treated this type of cost function. Since the conditional cost capacity α c (u i−1 1 ) is independent on u i−1 1 , all the results in this paper can be proved for this type of cost function.
On the other hand, from Lemma 4.2 with η n = nR, it holds that ε n (ϕ n , nR) > Pr 1 nα c log 1 P X n (X n ) ≥ R − 1 nα c log z n − z n , for any sequence {z n } ∞ n=1 (z i > 0, i = 1, 2, · · · ). Let z n be as K −nγ , then we have ε n (ϕ n , nR) > Pr 1 nα c log 1 P X n (X n ) Noting that γ > 0 is a constant, from the definition of H(X), we have lim n→∞ ε n (ϕ n , nR) = 1.
Therefore, the sufficiency has been proved.
Thus, R ≤ 1 αc H(X) holds from the definition of H(X). On the other hand, from Corollary 5.1, it holds that Thus, we have 1 αc H(X) − γ ≤ 1 αc H(X). Notice that γ > 0 is arbitrarily, we have 1 α c H(X) ≤ 1 α c H(X) Hence, we have H(X) = H(X). Therefore, necessity has been proved.
Proof of Theorem 6.4 We only show Case I. The proofs of Case II and Case III are similarly to that of Case I and [20, From the definition of F a (L), we have The last equality is derived from the definition of the mixed source. Thus, from Lemma 6.1 we where γ n is specified in Lemma 6.1.
Then, noting that H(Y 1 ) = H(Y 2 ) holds, from the asymptotic normality, we have lim n→∞ Pr − log P X n i (X n i ) − nH(X 1 ) for i = 1, 2. Noting that γ n → 0 as n → ∞ and the continuity of normal distribution function, we Similarly, the last term in (B.1) is evaluated as Hence, we have proved Case I of the theorem.