On information rates over a binary input channel

We study communication systems over band-limited Additive White Gaussian Noise (AWGN) channels in which the transmitter output is constrained to be symmetric binary (bi-polar). In this work we improve the original Ozarov-Wyner-Ziv (OWZ) lower bound on capacity by introducing new achievability schemes with two advantages over the studied OWZ scheme which is based on peak-power constrained pulse-amplitude modulation. Our schemes achieve a moderately improved information rate and do so with much less sign transitions of the binary signal. The gap between the known upper-bound based on spectral constrains of bi-polar signals and our achievable lower bound is reduced to 0.93 bits per Nyquist interval at high SNR.


W E study communication systems over band-limited
Additive White Gaussian Noise (AWGN) channels in which the transmitter's output is constrained to be bipolar, as presented in Figure 1. Such systems arise when the power efficiency must be high or when the transmitter needs to be of very low complexity. Those systems are usually implemented by some form of Pulse Width Modulation (PWM), Pulse Position Modulation (PPM), or similar schemes, operating over Gaussian noise channels [1], [2], [3]. Communication systems with binary transmitted signals are of recent practical interest in millimeter-wave wide-band applications, e.g. [4], [5].
In this work, we examine theoretical limits on communication with binary transmission, not limited to PWM. We are interested in the reliable information rate supported by this system focusing mainly on the region of asymptotically high SNR. This theoretical problem was addressed by Ozarow, Wyner, Ziv (OWZ) [6] using the Pulse Amplitude Modulation (PAM) method. OWZ [6] showed that performance, measured by mutual information, achievable with a signal peak-limited to ± √ P can also be achieved with a binary-valued ± √ P signal with a very high Sign Transition Rate (STR). They applied this finding to design a PAM scheme with symbols uniformly distributed in − √ P , + √ P , which provides an achievable lower bound on the capacity of the system. As implied by [6], The work of S. Shamai has been supported by the European Union's Horizon 2020 Research and Innovation Programme, Grant Agreement No. 694630, and partly by the WIN Consortium via the Israel Minister of Economy and Science. The manuscript was submitted to the IEEE Open Journal of the Communications Society (OJ-COMS).
Michael Peleg is with the Viterbi Faculty of Electrical and Computer Engineering, Technion-Israel Institute of Technology, and also with RAFAEL, e-mail: peleg.michael@gmail.com.
Tomer Michaeli and Shlomo Shamai (Shitz) are with the Viterbi Faculty of Electrical and Computer Engineering, Technion-Israel Institute of Technology, e-mails: tomer.m@ee.technion.ac.il, sshlomo@ee.technion.ac.il. peak-limited continuous-time signals such as filtered PAM, can in principle be also band limited [7] and hence represented by sampling at an appropriate rate, while the equivalent (in the sense of [6]) bipolar processes cannot be strictly bandlimited [8]. A lower bound exceeding for low SNR that of [6], was presented in [9], based on improved bounds for intersymbolinterference Gaussian channels. Additional results on capacity of systems with binary inputs, some with additional constraints on average transition rate, minimum inter-transition time and out of band power, are presented in [10] and [11]. Systems with limited minimal transition times were investigated in [12], including systems with mild filtering, that is, not strictly bandlimited as in [6].
The binary channel input carries information in its transition times. Sampling the binary input at a Nyquist rate corresponding to the channel bandwidth would degrade the performance severely, thus the system in Figure 1 falls in general into the category of Faster Than Nyquist (FTN) signaling which is of wide current theoretical and practical interest. In recent years FTN signaling approaches, forms and extensions of classical pulse-amplitude modulation strategies have emerged. See overviews of these relevant domains in [13], [14], [15] and references therein. See also recent examples of advanced theory and techniques in [16], [17], [18], [19]. FTN can provide significant advantages in terms of capacity with prescribed modulation techniques and signaling strategies, though the resultant channel may suffer significant inter-symbolinterference, which demands higher complexity detection procedures. Yet, no peak-power restrictions are imposed on the resultant time-continuous process, which is a central part in our scheme. This is motivated by practical constraints, as was also the case in [6], reflecting the constraints of magnetic storage media.
In this work we present new schemes with two advantages over [6]. They achieve a moderately improved information rate and do so with much fewer sign transitions of the binary signal. The new schemes require STR of only up to twice the Nyquist rate of the channel, while [6] uses STR many folds higher than the Nyquist rate; if implemented fully, the STR in [6] is infinite. Low STR is easier to implement in systems which are already wide-band and in which each sign transition must pass a power amplifier such as [4], [5]. We extended the technique in [6] to the new schemes in which the transmitted signal is a non-linear function of the information sequence. The studied communication system is presented in Figure 1. It comprises an encoder producing a binary-valued ± and a receiver. The channel has a frequency response H(f ), in our case a unity frequency response at frequencies from 0 to B and zero otherwise. The channel output y is where z(t) is the filtered desired signal and n(t) is the Gaussian noise. We denote by B the bandwidth of the low pass brick-wall filter in Hz, T = 1/(2B) is the Nyquist sampling period associated with B, ρ = P N0B is the signal to noise ratio (SNR), log denotes the natural logarithm and bold lower-case letters denote vectors and sequences.

II. KNOWN PERFORMANCE BOUNDS
Shamai and Bar-David [20], derived an upper bound on the system capacity, based on the fact that the Power Spectral Density (PSD) of a binary-valued signal is limited by certain constraints presented in [21] and [8]. They analyzed limits on spectral densities of binary signals and then upper bounded the capacity of the system by Mutual Information (MUI) when the channel input has the capacity achieving Gaussian distribution with the same PSD as the binary-valued signal. For high SNR they proved that relative to the capacity-achieving frequencyflat Gaussian input there is a power loss at least by a factor of γ = 0.9337, see definition of γ below. The same paper considers Random Telegraph Signal (RTS) as an interesting example rather than a bound and the power factor there is around γ = 0.63 which is an upper bound on the performance of RTS. The capacity C G bits/second of the channel with PSD given as S(f ) used in [20] is the well-known expression where C G is achieved with a Gaussian input. In the limit of asymptotically high S(f ) N0 the capacity C G becomes For a frequency-flat Gaussian signal of bandwidth B this yields Multiplying S(f ) in (2) by a factor γ increases C h G by B·log 2 γ information bits per second which are ∆ = 1 2 log 2 γ bits per Nyquist sampling interval. Consequently, the equivalent SNR gain is defined, for a scheme with bandwidth B, as a function of difference ∆ in information per Nyquist interval between the scheme and the AWGN channel with the same bandwidth and transmit power as, OWZ [6] derived the following achievable lower bound using the modulation method [6] described in the introduction.
This corresponds to γ OWZ = 2e π 3 = 0.1753 . The bipolar signal that achieves the performance of the PAM modulation technique in [6] involves high transition rate of the binary signal. An improved lower bound in the low SNR regime is reported in [9].

III. NEW ACHIEVABLE SCHEMES
The main results of this work are the improved lower bounds on the capacity of the bipolar-input bandlimited AWGN channel, see Proposition 1. The proposition is proved by introducing and analyzing new communication schemes.
We discuss four schemes, denoted by A, B, B1 and C. In all of them the time axis is partitioned into successive intervals of duration T equal to the Nyquist interval corresponding to B . In scheme A, the binary signal in each interval n of time t spanning (n − 0.5)T ≤ t < (n + 0.5)T is where a n are the information-carrying variables, uniformly, independently and identically distributed (u.i.i.d.) over [−0.5, +0.5]. Thus, information is conveyed by the time of sign reversal of the signal, see Figure ?? for an illustration.  We denote the sequence of all a n by a, denote the binary transmitted signal in interval n as x n (t) and over all the transmission by x(t) or x.
Scheme B is derived from scheme A by inverting the signal in successive intervals of length T to eliminate half of the sign transitions of the binary signal. See Figure 3.
Scheme C is derived from scheme A by inverting the signal in successive intervals at random where the signs s n valued as ±1 are used as additional information inputs. The signs s n are equi-probable and independent. The signaling in scheme C comprises a n and s n , thus the signaling rate is twice the Nyquist rate. Scheme B1 is introduced below. The STR of schemes A, B, B1 and C is 4B, 2B, 2B and 3B correspondingly by construction, see Figure 3, while the Nyquist rate is 2B. Denote by s the sequence of the sign inversions s n in schemes A, B and C, so that s n = −1 for the inverted symbols and s n = 1 otherwise.
Computing the exact capacity of the three schemes, that is, the MUI between the binary input x(t) and the channel output y(t), seems intractable. We therefore we resorted to deriving upper and lower bounds. To compute upper bounds on the communication rates of our schemes, we first evaluate the PSD of the signals. We assume that the signal is randomly shifted as a whole by a delay distributed uniformly over (0, T ) to render it stationary. The autocorrelations of the signals in the three schemes are derived in the appendix and summarized in (5).
The PSD was obtained by numerical Fourier transform of the autocorrelations, see Figure 4, and verified by simulation.
The PSD is obtainable analytically from the autocorrelations. For example for scheme C with T = 1 we have the one-sided PSD: The AWGN line in Figure 4 is the PSD of the standard bandlimited capacity-achieving signal without the binary constraint. As well-known from the water-pouring theory, it spreads the available power uniformly over the available bandwidth B. The PSDs of our three schemes suffer the disadvantage of wasting some of the transmitted power out of the channel bandwidth and of not spreading the remaining power uniformly. Scheme C is evidently better than schemes A and B. Indeed the schemes A, B and C are constrained to bipolar transmitted signals and therefore cannot possess a strictly bandlimited spectrum, as we know from [8]. The spectra of schemes A and B are identical except for the discrete frequency components (tones) which do not influence the outcome of (2). There are no discrete tones in scheme C since it decorrelates the pulses by random sign inversions, limiting the support of the autocorrelation to [−T, T ].
Based on the PSD, we compute the upper bounds on performance at high SNR of the three schemes using (2) and compare them to the optimal input which is a Gaussian signal with power P and a flat PSD from 0 to B . The results are presented in Table I.  [20] We proceed to derive lower bounds on communication rates of the new schemes. As shown in Figure 1, x(t) passes through the channel filter and is then contaminated by AWGN. The receiver filters the signal by the same low pass filter, which is clearly an information-lossless operation. We sample the filtered channel output at the Nyquist rate 1/T producing an infinite sequence y of samples y n . We denote the signal without the noise component by a sequence z of samples z n , see Figure 1.
We lower-bound the capacity I(x; y) = H(y) − H(y|x) by adapting the approach presented in OWZ [6]. Since H(y|x) is the known entropy of the noise, the main term to evaluate is H(y). OWZ lower-bounded H(y) as a function of the entropies of its components H(z) and H(n) using the Entropy- where τ n = |τ | mod T T and |τ | mod T denotes the modulo T operation.
where τ is the largest integer smaller than τ Power Inequality (EPI) presented in [22]. OWZ evaluated H(z) using the fact that the channel was an Inter Symbol Interference (ISI) channel representable by a Toeplitz matrix the determinant of which is computable using the Szegö theorem [23]. We begin by determining the entropy of z. The required differential entropy is In schemes A and B, each a n determines one symbol x n and those symbols are linearly filtered to produce z. The sequence a, treated as a vector in the next equation, comprises u.i.i.d. components, and therefore its differential entropy is: The noiseless sampled output z is a function of a, which we denote by z = m(a). To derive h(z) using the Jacobian formula (7) similarly to [6], we need our transformation z = m(a) to be a bijection and a and z must have identical dimensions.
Lemma 1: For every ε > 0, if the channel's bandwidth is B = 1 2T + ε, where T is the signaling period, then the transformation z = m(a) in schemes A and B is a bijection.
Proof: The modulation scheme in Figure 1 is deterministic, therefore each sequence a can produce only a single sequence z. It remains to prove that there are no two distinct sequences a producing the same z. If this would happen, then there would exist a pair of transmitted signals x 1 = x 2 such that z(x 1 ) = z(x 2 ), implying d def = z(x 1 ) − z(x 2 ) = 0. Since the low-pass filter is linear, such a d would be the low-pass filtered signal x 1 −x 2 . By the construction of x, for schemes A and B, not C, the difference x 1 − x 2 would be a sequence of pulses as depicted in Figure 5 in which each pulse is assigned a symbol interval T during which it has a zero value except for some contiguous duration in which it is ±2 , see Figure 5.
So it is sufficient to prove that such a nonzero signal x 1 −x 2 cannot have zero spectra in 0 ≤ f ≤ B + . This follows directly from [24, Theorem 1], which proved that signals with zero spectra in 0 ≤ f ≤ B + ε, which are denoted in [24] as high-pass signals or signals with a zero gap, change sign at average rates higher than 1/T = 2 B, which is the highest possible rate of sign changes of the function x 1 −x 2 = x 1 (t)− x 2 (t) in Figure 5. Thus, such a nonzero x 1 − x 2 cannot exist and z = m(a) is a bijection. The asymptotically small change in B is immaterial in this work by the problem definition.
Since z = m(a) is a bijection, the entropy h z is where ∂zi ∂aj denotes the determinant of the Jacobian matrix of z = m(a), p(a) is the probability density function of a and E a denotes expectation with respect to a. The Jacobian matrix is denoted ∂zi ∂aj = J. Unlike OWZ [6], in our scheme A, the Jacobian matrix is not Toeplitz since here ∂zi ∂aj depends on each a j . Therefore, we could not follow OWZ using the Szegö theorem [23]. Instead, we evaluated the expectation in (7) numerically by generating the signals z with random sequences a, computing J for each z and averaging h z . The Jacobian matrix J is evaluated by where t ij is the time elapsed from the time of transition a i to the sample z i . The sign is positive for transitions from 1 to −1 and negative otherwise. The computation was executed on cyclic sequences 500 and 1000 symbols long and verifying identical result in both cases. Denote The result of numerical evaluation is h d = 0.5197 nats for T = 1 and P = 1 and is invariant with T , see (6) and (8).
The entropy (7) is identical in scheme A and in scheme B with its alternate sign inversions. This is because ∂zi ∂aj changes sign when s j = −1, so for scheme B we can create a new auxiliary vectorâ = (a 1 s 1 , . . . a i s i , . . . ) (9) in which ∂zi ∂âj is identical to ∂zi ∂aj in scheme A and h(â) = h(a) yielding the same h z in schemes A and B.
The true entropy of z is larger by 0.5 log(P ) due to multiplication by √ P and, a has a unity support, so h a = 0, see (6). Thus, The entropy of the sampled noise at filter output is h n = 0.5 log (2πeBN 0 ) .
By EPI [22], the entropy of the sum is upper bounded in terms of entropies of its components: e 2hy ≥ e 2hz + e 2hn e 2hy ≥ e 2(h d +0.5 log(P )) + 2πeBN 0 h y ≥ 0.5 log e 2h d +0.5 log(P ) + 2πeBN 0 ≥ 0.5 log e 2h d +log(P ) + 2πeBN 0 − 0.5 log(2πeBN 0 ) The AWGN capacity is So the power gain at all SNRs over the AWGN channel is lower-bounded by OWZ [6] reported a better result, γ OWZ = 0.1753, see (3) above; our software reconstructs this result as a verification. The same analysis technique used here for the brickwall channel response is applicable to a general channel frequency response H(f ). To extend the technique to a more general H(f ), the sinc pulse used in (8) to compute the Jacobian matrix and shown in Figure 3 would be replaced by the new channel impulse response. Furthermore, H(f ) would need to be non-zero over 0 < f < B to fulfill the conditions of Lemma 1.
Next we show an improved performance in scheme C. To increase h(z), the polarity of each pulse is inverted at random. As seen in Figure 4 this also removes the wasted discrete tones from the signal spectra. The analysis above cannot be applied directly since now x(t) → z(t) is not a bijection as demonstrated by construction of pairs of signals x(t) the difference of which have period of T and a zero mean, thus zero PSD in the 0 to B frequency band. For scheme C the system mutual information between the modulator inputs a, s and the channel output is: I(a, s; y) = I(s; y) + I(a; y|s) .
The second term on the r.h.s. is equal to schemes A and B, the signs s on which this term is conditioned are treated by the auxiliary vectorâ as defined in (9) for scheme B. The first term on the r.h.s. is the improvement achieved by scheme C relative to schemes A and B. We lower-bound it as follows.
Denote the sequence of derivatives of y(t) at times nT bẏ y = {ẏ n }.
The first line is sinceẏ is a function of y. The second line is by the standard mutual information decomposition [22]. The third line is since s n is independent of s n−1 1 . The last term was evaluated by simulation of scheme C while estimating the symbol-wise probability densities P (ẏ n |s n = 1), P (ẏ n |s n = −1) and P (ẏ n ) as plotted in Figure 6. It adds 0.136 bits per symbol at asymptotically high SNR which is equivalent to a power gain of γ = 1.207. Scheme C achieves γ = 0.20, moderately better than OWZ. We expect that better detectors would improve upon this lower bound.
We noticed that each pair of consecutive sign-transitions with exceptionally short inter-transition time introduces a very low singular value to J which reduces our lower bound. To address this we designed scheme B1. Scheme B1 improves upon scheme B by introducing a minimal inter-transition interval T g = 0.2T and by extending the range in which each sign-transition time can occur. In particular, as in scheme B, each transmission interval of duration T is associated with one sign-transition. However, the transition time specified in (4) as uniformly distributed over the n'th transmission interval spanning (n − 0.5)T ≤ t < (n + 0.5)T in scheme B, is, in the new B1 scheme, distributed uniformly over a window W s , see Figure 7, which starts T g after the previous sign-transition and ends, as in scheme B, at the end of the current interval. This is illustrated in Figure 7. Note that the sign transition associated with the n'th transmission interval may occur in the n'th interval or in one of the few intervals preceding it.

Tg Ws
Interval n, duration T Lemma 1 holds also for scheme B1 in which the difference signal is as in Figure 5 with the same average number of pulses except for not confining each pulse to its own T -interval. That is, in both the schemes B and B1, the total number of all the negative and positive pulses in the difference signal is half of the total number of sign transitions in x 1 and x 2 .
The entropy of a in (6) is now calculated numerically as h a = 1 N n h(a n |a n−1 It is larger by 0.4095 nats than that of scheme B, contributing to the performance. Scheme B1 achieved the best performance among the four schemes, see Table II.
With scheme B1, the achievable lower bound has an advantage of a power factor of 1.47 at all SNRs and of 0.28 bits per Nyquist interval T at high SNR over the scheme reported in [6]. Comparing to Table I, the gap between the upper and the lower bounds specific to the schemes is 0.43 and 0.438 bits per Nyquist interval for schemes A and C respectively. The gap between the upper bound in [20], entry 6 in Table II, and the  best achievable lower bound, entry 4 in the table, [20] The three schemes have distinct attributes. Schemes A and B serve to build up the theoretical base and they provide a lower bound on capacity valid for all SNRs. Scheme C is an extension providing an improved lower bound at high SNR and an improved spectra. Scheme B1 provides the best lower bound at all SNRs.
Proposition 1: If the capacity of the AWGN frequency-flat low-pass channel with a given average input power serves as a baseline, then imposing an additional constraint of a symmetric binary bipolar input does not degrade the capacity more than by a power loss of 0.2586 at all SNRs and information loss of 0.93 bits per Nyquist interval at high SNR.
Proof: Compare scheme B1 in Table II to the upper bound in [20].

IV. CONCLUSION AND OUTLOOK
We studied communication systems over the band-limited AWGN channel in which the transmitter output is constrained to be binary bipolar. We presented new schemes which provide an improved lower bound on the capacity of this channel. The gap between the known upper bound and our new achievable lower bound is reduced to 0.93 bits per Nyquist interval at high SNR. Furthermore, the schemes operate at a much lower rate of sign transitions than the bipolar signaling that achieves the PAM based bounds in [6].
There is a room for future work attempting to improve the achievable lower bound. For this purpose signals with spectra more concentrated in the lower frequency regions than our scheme C, see Figure 4, should be investigated. Interestingly, the maximal power factor γ of the Random Telegraph Signal (RTS) is achieved with average transition rate of about 0.67 per Nyquist interval, less than the 1.5 average transition rate of our scheme C leading to a narrower PSD, thus a future analysis of performance of the RTS signaling might reduce the gap between the upper and lower bounds further.
The lower bound on performance presented here might be improved in future work based on techniques that consider PWM and also RTS in terms of lower bounding the filtered minimum mean square error, and incorporating the Information Estimation relations [25]. Further interesting useful techniques developed for ISI channels [26], [27] should also be considered.
In this paper the signals are designed for good performance in the high SNR regime while the results for schemes A, B and B1 hold for all SNRs, see (10). Future work may address the non-asymtotic low and intermediate SNR region based on new schemes adapted to SNR and on advanced FTN techniques listed in the introduction for which the Shamai-Ozarow-Wyner [9] bound is of direct relevance.

APPENDIX-AUTOCORRELATIONS
Denote the autocorrelation of x(t) as where E x,t denotes expectation over x and over −T < t < T .
For scheme C we have The first parenthesis is the correlation given that t and +t are in the same symbol interval, the second parenthesis is the probability of this occurrence. The expression for cases A and B is a little more involved. For |τ | T > 1, x(t), and x(t + τ ) are independent, thus For scheme A, E x(t) = 1 − 2/T . It follows by a straightforward integration for scheme A: where τ n = |τ | mod T T and |τ | mod T denotes the modulo T operation. For scheme B: where τ is the largest integer smaller than τ . For |τ | T < 1, the expectation over t is the sum over the events in which t and τ + t are in the same symbol interval which yields (14) and of a term contributed by the events where t and τ + t fall into successive symbol intervals where (15) applies. The result is: where the sign is positive for A and negative for B. Collecting the equations above yields (5) and Figure 8.