A Refinement of Expurgation

We show that for a wide range of channels and code ensembles with pairwise-independent codewords, with probability tending to 1 with the code length, expurgating an arbitrarily small fraction of codewords from a randomly selected code results in a code attaining the expurgated exponent.

We assume that codewords are generated in a pairwise independent manner, that is, for any two indices m, k ∈ {1, . . ., M n }, m = k, it holds that is a probability distribution defined over X n .
Let P e,m C(M n , n) and P e C(M n , n) be the random variables denoting the error probability of the mth codeword for random code C(M n , n) and the average error probability of the code, respectively.We denote the n-length error exponents of such random variables by E m C(M n , n) = − 1 n log P e,m C(M n , n) and E C(M n , n) = − 1 n log P e C(M n , n) , respectively.For some ensembles and channels the ensemble-average of the code error probability E P e C(M n , n) is known to decay exponentially in n [1].A lower bound on the error Eq. (5.6.16)].For the discrete memoryless channel (DMC), this bound is known to coincide with the sphere-packing upper bound on the reliability function [3], [4] in the high rate region.
In [5,Sec. 5.7] Gallager showed that, for some channels and ensembles, there exists a code with strictly higher error exponent than E n r (R, Q n ) at low rates.In order to show this, Gallager considered a pairwise-independent ensemble with M ′ n = 2M n − 1 codewords.Using Markov's inequality he showed that for any s > 0. He then introduced the indicator function and showed that, using (1) and ( 2), the following inequality holds From (3) it follows that, since the average number of codewords that have a probability of error smaller than s in a randomly generated code with M ′ n = 2M n − 1 codewords is at least M n , there must exist a code having at least M n codewords, out of the M ′ n , fulfilling this property.Thus, by removing (expurgating) the worst half of the codewords from the code with M ′ n codewords we obtain a new code with M n codewords, each of which satisfies the condition in the first line of the right-hand side in (2).Finally, restricting s to 0 < s ≤ 1, Gallager derives a lower bound on the exponent of 2 1 s , given by where is the parameter that yields the highest exponent.The preceding argument is valid for the maximal probability of error, since every codeword in the expurgated code attains the same exponent.In addition, observe that since (3) uses the standard ensemble-average argument (i.e. by taking the average over the ensemble) we show the existence of a code with the desired property.The exponent in (4) is the expurgated exponent.We refer to the code with M ′ n codewords before expurgation as a mother code.We say that a mother code is good if, once expurgated, we obtain a code with asymptotically the same rate, the codewords of which each have an exponent at least as large as the expurgated.
A refinement of the above follows from (1).Specifically, for ǫ > 0 it can be shown that there exists a code with codewords such that removing ǫM n codewords yields a code that attains the expurgated exponent [6, Lemma 1].Although [6, Lemma 1] generalizes Gallager's method, it still only shows the existence of a code that attains the expurgated exponent.

II. MAIN RESULT
This paper strengthens existing results on expurgation by showing that the probability of finding a code with codewords that contains a code with at least M n codewords each of which achieving the expurgated exponent tends to 1 with the code length.We define the sequence δ n = ρn n log γ n , where γ n is such that lim n→∞ γ n = ∞ while lim n→∞ log γn n = 0, ρn being a positive sequence defined in (6) that depends on the channel, the ensemble and the rate.From the definition of δ n it can be seen that if ρn either converges to a constant or grows sufficiently slowly, there exists a γ n such that δ n → 0. Similarly to Gallager, for a given δ n , we define the indicator function and the number of codewords attaining an exponent higher than Theorem 1: Consider a pairwise-independent code ensemble with M ′ n = M n (1 + ǫ) codewords and any ǫ > 0. If the sequence {δ n } ∞ n=1 , which depends on the channel and the ensemble, satisfies lim n→∞ δ n = 0, then for any 0 < ǫ 1 < ǫ, it holds that Proof: See Section III.
In words, with high probability we find a mother code with M ′ n = (1 + ǫ)M n codewords, M n of which attain the expurgated exponent.That is, good mother codes are found easily and only contain an arbitrarily small fraction ǫ/(1 + ǫ) of codewords that need to be expurgated.Theorem 1 extends Gallager's method, and applies, among others, to independently and identically distributed (i.i.d.) and constant composition codes over DMCs, as well as channels with memory such as the finite-state channel in [2, Sec.4.6], for which the expurgated exponent is derived in [7].
As a final remark, recent works [7]- [10] show that for many ensembles, most low-rate codes have an error exponent E C(M n , n) that is strictly larger than the exponent of the ensemble average error probability, i.e., the random coding exponent.Similarly, Theorem 1 implies that for most codes, almost any codeword has an associated error exponent E m C(M n , n) that is strictly larger than the ensemble average of the exponent of the error probability of the codebook E E C(M n , n) .In both cases the smaller error exponent of the average probability of error is due to a relatively small number of elements (codes in the first case, codewords in the second) that perform poorly.Furthermore, as shown in [9], [10] for i.i.d. and constant composition codes over DMC, the error exponents of the codes in the ensemble concentrate around the typical random coding (TRC) exponent [8], [11].Similarly to such works, it can be shown that the error exponent E m C(M n , n) , for any m, concentrates around its mean, the expurgated exponent.The proof makes use of Lemma 1 in Section III, and follows almost identical steps as in [10, Theorem 1], [7, Theorem 1] and [7,Theorem 2] once P e (C) is replaced by P e,m C and it is omitted here.

III. PROOF OF THEOREM 1
We start with the following lemma, whose proof is almost identical to that of [7,Lemma 1].
Lemma 1: For a channel W n and a pairwise-independent M ′ n -codewords code ensemble with codeword distribution Q n , for any m ∈ {1, . . ., M ′ n } it holds that where γ n and δ n are positive real-valued sequences.
The proof of Lemma 1 follows from Markov's inequality and applying the same steps as in [7, Theorem 1] once P e (C n ) is replaced with P e,m (C n ).The sequences γ n and δ n are the same as those introduced in Section II.Observe that using inequality (11) and following similar steps as in [7] it can be shown that Furthermore, using similar arguments as in [10] it can be shown that such bound is tight at least for i.i.d. and constant composition codes over DMC.That is, for such ensembles and channels lim n→∞ E − 1 n log P e,m C(M n , n) = lim n→∞ E n ex (R, Q n ), i.e., the expurgated is the typical codeword exponent.
If the positive sequence ρn , defined in (6), converges or grows sufficiently slowly, then there exists a sequence γ n such that lim n→∞ γ n = ∞, lim n→∞ log γn n = 0, for which δ n = ρn n log γ n → 0. For rate zero, that is when lim n→∞ 1 n log M n = 0, the n-length error exponent in (4) depends on the particular subexponential growth of M n , while ρn tends to infinity with a growth that depends on the channel and the ensemble.In this case, as discussed in the paragraph succeeding [7, Eq. (89)], the assumption that ρn n log γ n → 0 holds if the normalized variance of the Bhattacharyya coefficient Z n (x, x ′ ) grows slower than n log γn .In any case, choosing such γ n and applying Lemma 1 we have that The random variable Φ(C(M ′ n , n)), averaged across the ensemble, satisfies where ( 14) follows from the definition of the indicator function ( 7) and (12).
April 22, 2024 DRAFT We define , which is the number of codewords with exponent smaller than Then, for sufficiently large n we have that where (17) follows from Markov's inequality and ( 16).This shows that the probability of finding a code with many codewords with exponent strictly smaller than E n ex (R, Q n ) − δ n vanishes with n.To prove our main result, we write the tail probability in (9) as where we used the definitions of Ψ(C(M ′ n , n)) and M ′ n .Since γ n tends to infinity, there must exist an n 0 ∈ N such that ǫ − ǫ 1 > (1+ǫ) √ γn for n > n 0 and therefore where (21) follows from (17).Finally, solving the limit yields the desired result.