Binary-Tree Encoding for Uniform Binary Sources in Index Modulation Systems

The problem of designing bit-to-pattern mappings and power allocation schemes for orthogonal frequency-division multiplexing (OFDM) systems that employ subcarrier index modulation (IM) is considered. We assume the binary source conveys a stream of independent, uniformly distributed bits to the pattern mapper, which introduces a constraint on the pattern transmission probability distribution that can be quantified using a binary tree formalism. Under this constraint, we undertake the task of maximizing the achievable rate subject to the availability of channel knowledge at the transmitter. The optimization variables are the pattern probability distribution (i.e., the bit-to-pattern mapping) and the transmit powers allocated to active subcarriers. To solve the problem, we first consider the relaxed problem where pattern probabilities are allowed to take any values in the interval [0,1] subject to a sum probability constraint. We develop (approximately) optimal solutions to the relaxed problem by using new bounds and asymptotic results, and then use a novel heuristic algorithm to project the relaxed solution onto a point in the feasible set of the constrained problem. Numerical analysis shows that this approach is capable of achieving the maximum mutual information for the relaxed problem in low and high-SNR regimes and offers noticeable benefits in terms of achievable rate relative to a conventional OFDM-IM benchmark.


I. INTRODUCTION
As a subclass of permutation modulation [1], index modulation (IM) has recently attracted significant interest [2], [3] due to its feature of "achieving more by doing less". The central idea of IM lies in the observation that, in addition to encoding information in a signal, one can encode information in the order in which a signal is conveyed in a given domain. The idea of encoding information using permutations or combinations has been applied in several contexts. For example, by using different transmit antennas and channel uniqueness, permutation modulation has been employed in the spatial domain in the form of so-called spatial modulation [4], [5]. Similar ideas have been applied to the medium/channel domain by manipulating the radiation patterns of antennas [6], [7]. Permutation modulation has also been used in the subcarrier index domain in multicarrier systems, such as orthogonal frequency-division multiplexing (OFDM). This approach is commonly referred to as subcarrier-IM or simply IM [8], [9]. Finally, the use of permutation methods in conjunction with different modes in orbital angular momentum transmissions has been studied [10], [11].
To facilitate the use of combinatorial patterns for encoding, a codebook for the mapping between patterns and the source messages (bit sequences) must be specified. Many existing works that study permutation modulation in digital communication systems assume that the number of possible patterns is a power of two [4], [12], [13]. However, such an assumption limits the applicability of permutation modulation, e.g., conventional spatial modulation (with a single active antenna in each transmission period) is only applicable when the number of antennas at the transmitter is a power of two.
Another typical approach that has been studied is to assume that only a subset of all possible patterns contains valid patterns, and the size of the subset is a power of two [9], [14]- [16]. However, this approach is not able to utilize the full potential of permutation modulation in terms of data rate, because a certain number of possible permutations that could have been used to carry information are neglected [17]. The study detailed in [18] considers the possibility of using all permutation patterns with uniform probability, but no treatment of how to realize the uniform probability distribution in digital communication systems is given in that work.
To address these issues related to the mapping of source bit sequences to permutation patterns, a few recent contributions have focused on the adaptation of binary Huffman coding [19] for permutation/index codebook design [11], [17], [20]- [23]. Here, a bijective mapping between information bit sequences and the permutation/index patterns is constructed with the aid of a full binary tree; patterns are associated with leaves in the tree, and corresponding bit sequences are defined according to a labeling rule (used in the Huffman algorithm) pertaining to the respective paths from each leaf to the root. Importantly, in contrast to conventional application scenarios for source compression where the source symbol distribution is known a priori, the probability distribution of the patterns observed during transmission in permutation modulation systems is dependent upon the binary source [17]. In this sense, the Huffman mapping is applied in permutation modulation schemes in a reversed manner. We adopt the term binary-tree encoding rather than Huffman coding for the bit-to-pattern mapping operation for the remainder of this paper in order to highlight this subtle, but important difference.
Binary-tree encoding for permutation modulation schemes enables one to choose the probability distribution of the permutation patterns to achieve certain design criteria, e.g., achievable rate maximization [11], [17], [21], [23] or symbolerror rate (SER) minimization [21], [23]. However, existing works along this direction fall short in a number of ways. For example, the support of the (random) patterns, when constrained by full binary tree structures, is discrete. As a result, optimization problems for maximizing achievable rates or minimizing SERs are of mixed-integer forms, and an exhaustive search over all admissible probability distributions may be required to find the global optimum. However, the number of admissible distributions has not been characterized in the literature, and thus the complexity of exhaustive searching is not well understood. A common way to reduce optimization complexity that has been treated in the literature is to relax the full-binary-tree constraint on the pattern probability distribution [11], [21]. However, the problem of how to project the relaxed probability distribution to a feasible distribution that satisfies the full-binary-tree constraint remains open. An alternative strategy that has received attention recently has been to focus on high and low signal-to-noise ratio (SNR) regimes. For the limited case of single-active-antenna spatial modulation, analytic forms of the asymptotically optimal probability distributions for the permutation patterns were reported in [21]. A generalization that activates multiple resources per channel use, which is the scenario of interest for multicarrier communication systems such as OFDM-IM [9], is desirable.
In this work, we study the subclass of permutation modulation where K out of N resources are active during each channel use. We concentrate our investigation on OFDM-IM systems [9], because OFDM-IM is a primary user of the permutation modulation subclass that we study, and any results obtained for full binary trees would be directly applicable to other permutation modulation schemes. Our main goal is to optimize the bit-to-pattern mapping operation and transmit power allocation strategy for achievable rate maximization when channel state information is available at the transmitter. We make the following contributions. 1) We give a complete and rigorous formulation of the bit-to-pattern mapping problem using the formalism of full binary trees, which covers all admissible pattern probability distributions given a uniform binary source.
To this end, we report a new method to generate a reduced set of these trees and establish bounds on the number of trees in this set, which have not been reported in the mathematics or engineering literature to the best of our knowledge. 2) We formulate a relaxation of the achievable rate optimization problem with pattern probabilities and transmit powers as the optimization variables and give a number of analytic bounds and high/low-SNR asymptotic results that can be used to (approximately) solve the problem. 3) We propose an efficient, heuristic algorithm that projects a relaxed pattern probability distribution onto the feasible set of distributions that obey the full binary tree constraints, and demonstrate that this method yields an achievable rate that is superior to a conventional OFDM-IM benchmark.
The rest of the paper is organized as follows. In Section II, the basic OFDM-IM model is described, with emphasis being placed on the binary-tree encoding operation. Section III explores the fundamental properties of binary trees; in this section details of the new tree construction algorithm are provided along with proof of completeness and bounds on the number of trees of a given size are reported. In Section IV, a relaxation of the achievable rate optimization problem is explained, and several analytic bounds and asymptotic results related to this problem are given. The fully constrained optimization problem is treated in Section V, where the aforementioned heuristic projection algorithm is outlined. A numerical analysis of all results reported in the paper are included in Section VI, and conclusions are drawn in Section VII.

A. Binary-Tree Encoding
Consider a binary sequence {B n } n∈N , which is conveyed from a maximum entropy source to an OFDM-IM encoder. The maximum entropy property of the source implies the sequence elements B n ∈ {0, 1} are independent, uniformly distributed random variables. The encoder partitions 1 the sequence {B n } into two subsequences {B n k } and {B n }, where k, ∈ N and k = . One subsequence (say, {B n k }) is mapped to a sequence of M -ary complex-valued constellation symbols. For example, if 16-QAM is employed, M = 16, and each group of m = log 2 M = 4 bits in {B n k } is mapped to a QAM symbol. The other subsequence is used to assign the M -ary symbols to subcarriers in preparation for transmission. In the IM system considered in this paper, we assume each OFDM symbol vector is comprised of G groups of N subcarriers, and K ≤ N subcarriers in each group are active, while the remaining N − K subcarriers are nulled 2 . In keeping with convention, we use the term subcarrier activation pattern (SAP) to denote a pattern of K active subcarriers (out of N ).
We are interested in system designs that maximize the achievable rate of OFDM-IM; hence, we consider bit-to-SAP mapping strategies that cover the full set of available SAPs. Since there are N K SAPs, it is generally not possible to construct a fixed-length bit-to-SAP mapping scheme that satisfies this condition. For example, with N = 4 and K = 2, six SAPs are available. By using a fixed-length mapping scheme, it would be possible to map two bits to one of four SAPs, leaving two SAPs unused.
To overcome this issue, we employ a variable-length scheme based on full binary trees. A tree is a full binary tree if every node other than the leaf nodes has exactly two children. Every full binary tree comprised of v internal nodes has v +1 leaves. Considering the total set of v-node full binary trees 3 , the maximum depth of a tree in the set ranges from log 2 v + 1 to v. Each edge is labeled with zero or one. Each SAP is associated with a unique leaf. For each leaf, the set below is the bit sequence mapped to that leaf.
It is well known that one can map symbols from a source alphabet to uniquely and instantaneously decodable bit sequences using full binary trees. Indeed, this method is employed in the celebrated Huffman source coding algorithm. For the IM system considered herein, we apply a reverse mapping approach, which entails the use of a chosen binary tree to map source bit sequences to SAPs. Each edge of the tree is labeled with a zero or a one, and the tree is constructed such that it has N K leaves. Each SAP in the set of N K admissible patterns is associated with a leaf. The bit-to-SAP mappings are determined by tracing the unique path from the root node to each leaf, recording the bit labels for each edge in order along the way. Fig. 1 provides an illustration of this procedure for the example of N = 4 and K = 2.
Similar to Huffman source coding, the use of full binary trees to develop a bit-to-SAP mapping rule ensures each mapping is unique and instantaneously encodable. Uniqueness results from the binary tree structure. Instantaneous encodability simply means that the encoder can map bit sequences to SAPs using the minimum amount of information. To illustrate this point, we can again turn to Fig. 1. Suppose the SAP bit sequence is {0, 0, 1, 1, 0, 1, 1}. Reading left to right and referring to Fig. 1, we see that the encoder only needs to read the first three bits to map them to the SAP associated with the second leaf. The encoder would then read {1, 0}, which also yields a valid SAP (the fifth leaf), and so on. In this example, it is clear that the encoder does not need to interpret long sequences of bits in order to decide upon the correct mapping relating to the first few bits.
It is also important to note that every possible two and threebit sequence is accounted for in the mapping shown in Fig. 1. This property extends to mappings based on other values of N and K. For a maximum entropy bit source, each subsequence consisting of q bits will appear with probability 1/2 q . Thus, we immediately deduce that an SAP associated with a leaf node at level q below the root will be transmitted with probability 1/2 q . This feature of binary-tree encoding imposes a constraint on the system, which much of the literature published on this topic to date has largely ignored. In this work, we will exploit the structure imposed by this constraint to develop efficient optimization procedures for OFDM-IM systems.

B. OFDM Model
Once bit-to-symbol mappings (both constellation and SAP) have been completed, each length-GN OFDM symbol vector is processed with a GN -point inverse discrete Fourier transform (DFT) and a cyclic prefix of adequate length (to mitigate the effects of channel dispersion) is appended to each timedomain symbol array prior to filtering, up-conversion, and transmission. At the receiver, the received signal is downconverted, filtered, and sampled. The cyclic prefix is then removed from each received baseband symbol vector before processing with a DFT. It is well-known that this sequence of procedures converts the dispersive channel into a parallel channel, and the signal on each subcarrier is (ideally) free of interference from other subcarriers.
We now formalize the OFDM model. Define C := N K . We can uniquely associate each SAP with an index in the set U := {1, . . . , C}. For each i ∈ U, denote by S i ⊆ {1, . . . , N } the set of indices of the K subcarriers that are active under pattern i, where equality holds when K = N (which corresponds to a conventional OFDM system). The index symbol U is randomly distributed over U with probabilities p i := P(U = i), i ∈ U. The channel input-output relationship for subcarrier l ∈ {1, . . . , N } conditioned on the SAP can be written as where √ g l e jθ l is the complex channel coefficient for subcarrier l (with j = √ −1); the input symbols {X l } are zero mean and independent over the subcarriers with ρ li being the transmit power on subcarrier l for index i; the noise is independent over the subcarriers with Z l ∼ CN (0, σ 2 ). Throughout this paper, we will assume the channel gains {g l } are known at the transmitter. We will return to this model in the context of mutual information optimization in Sections IV and V.

III. FULL BINARY TREES
One of the goals of this work is to develop a method of computing the bit-to-SAP mapping that maximizes the achievable rate of an OFDM-IM system. This is equivalent to determining the full binary tree that defines the optimal mapping. To achieve this aim, we will need a method of considering all full binary trees of a given size as well as all SAP-to-leaf assignments. At first glance, this is a complicated problem. The number of v-node trees is given by the Catalan number and the number of SAP-to-leaf assignments is (v + 1)!. However, it is possible to significantly simplify the problem by making use of symmetry. The important aspect of the mapping is not in the exact tree that is chosen, but rather in the level of the leaf node that a given SAP is assigned to. As noted in Section II, an SAP assigned to a leaf at level q has probability 1/2 q of being transmitted. We can transpose leaf Append τ to the left-most leaf on the lowest level of t, and add the new tree to T k ; Append τ to the left-most available leaf on the next-to-lowest level of t where possible, and add the new tree to T k ; end k ← k + 1; end nodes at a given level in any way we wish and still achieve the same SAP probability distribution. This reasoning leads us to consider a smaller set of trees, which we call the reduced set of v-node full binary trees T v . Each tree in this set actually corresponds to an automorphism group of the complete set. Moreover, consider a given tree t ∈ T v and denote the number of leaves at level q by n q . Due to the symmetry stated above, the number of ways of assigning v + 1 objects (i.e., SAPs, where v + 1 = C) to the leaf nodes such that we attain a unique probability distribution is which can be considerably smaller than the total (v + 1)! permutations. We now give preliminary results on the construction and enumeration of the set T v , which will be useful in determining systematic optimization procedures and analyzing computational complexity.

A. Construction
In order to choose the best tree for encoding, we require a method of constructing all trees in T v . The approach we propose is outlined in Algorithm 1, which is valid for v ≥ 2. The initial set T 1 = {τ } consists of the single full binary tree τ with one root and two leaves (at level one). This protograph is recursively appended to trees to obtain the set T v . The algorithm is presented in a somewhat informal way here for clarity; we formalize it slightly in Appendix A in order to prove the following proposition. Proof: See Appendix A in the Supplemental Material. As an example of the output of Algorithm 1, Fig. 2 shows the sets generated for v = 1, 2, 3. Note that the tree shown for the set T 1 is the protograph τ . The number of protographs contained in a graph of T v is v.

B. Enumeration
As noted above, the number of ordered full binary trees with v internal nodes is given by the Catalan number c v . The reduced set of v-node full binary trees contains significantly fewer elements. For example, Fig. 2 shows that two trees are contained in T 3 ; yet, by considering all orderings of these two trees, we can enumerate five ordered trees (c 3 = 5).
Let T v denote the number of trees in the set T v . From Algorithm 1, we can infer the relations since each step in the for loop at most doubles the number of elements in T k . This bound captures the slower exponential growth in the number of trees in the reduced set compared to the set of ordered trees. Numerical results have shown that the bound overestimates the rate of increase in v. Published results on full binary trees have attempted to obtain generating functions for the number of trees in unordered, unlabelled sets (see, e.g., [24] and references therein). However, it appears that results on the reduced sets that we are interested in remain undiscovered. It is possible to obtain a tighter bound on T v by analyzing Algorithm 1. The bound is given as a recurrence relation in the following proposition.

Proposition 2. The number of trees in T v is upper bounded by
where δ v = 1 if v is a power of two and δ v = 0 otherwise, and the summation is empty when v < 5.
Proof: See Appendix B in the Supplemental Material. The accuracy of each of the two bounds given above is illustrated for sets of up to twenty internal nodes in Fig. 3. From the figure, we see that the loose bound slightly overestimates the growth rate of T v . The recursion is exact up to v = 9, but slowly diverges for larger v, although it clearly remains fairly tight up to v = 20. Practically, we will be interested in reasonably small v; hence, the recursion is a useful tool for analyzing the IM systems studied in this paper.

IV. MUTUAL INFORMATION OPTIMIZATION: RELAXATION
We now provide details of new results and methods related to the optimization of the mutual information in OFDM-IM systems. As noted in Section II, the SAP probabilities are constrained by the binary tree chosen for encoding. Before we treat these constraints, we will consider the relaxed problem,  3)). The tight bound corresponds to the result given in Proposition 2, where the recursion is performed over the bounds on Tv rather than the exact enumerated values. The "Unordered, unlabelled" plot corresponds to the enumeration given in [24]. The number of ordered full binary trees (the Catalan number) is plotted as a reference.
for which it is assumed that SAPs can be transmitted with any probability. This will give an upper bound on the achievable rate for the constrained system, and we will use the approaches developed herein to treat that case in Section V. Consider a single set of N subcarriers that adhere to the model described in Section II. We collect the N received symbols in the vector Y := (Y 1 , . . . , Y N ). Furthermore, we collect the K transmitted symbols in the vector X := (X 1 , . . . , X N ), noting that X l is nonzero only when subcarrier l is active, as given by the encoded SAP. Define the SAP probability vector p = (p i ) and the power vector ρ = (ρ li ). We are interested in the probabilities in p and transmit powers in ρ that maximize the mutual information Conditioned on U = i, we assume X l ∼ CN (0, ρ li ) when l ∈ S i . Choosing X l to be Gaussian is not proven to achieve capacity, but the assumption provides a tractable expression. In this case, the complex random vector Y has probability density function (pdf) where with mean zero and variance ν.
Writing I(X; Y ) = I(p, ρ, σ 2 ), the optimization problem is formulated as Note that the relaxation alluded to earlier manifests in the simple constraint i∈U p i = 1. If we were to consider only probability vectors p that adhere to the binary-tree encoding methodology, this constraint would be defined differently (see Section V). We now detail several strategies for solving, either approximately or exactly, the optimization problem stated in (8).

A. Concavity and Numerical Optimization
The following result that can be used to solve (8) numerically.
is concave.
Proof: See Appendix C in the Supplemental Material. For the special case where g l is constant for all l and a balanced power distribution is chosen (i.e., ρ li = ρ for all l and i), Lemma 1 leads to the following. Proposition 3. When the channel gains and transmit powers are constant across frequency, the optimal SAP probability distribution is uniform.
Proof: See Appendix D in the Supplemental Material. More generally, Lemma 1 suggests that it may be economic to solve (8) by employing a block coordinate descent (BCD) approach [25], in which one would alternately maximize the mutual information in either p or ρ while keeping the other vector fixed at the previously obtained optimum value. The method requires the constraints of the problem to be convex, which is clearly satisfied. Furthermore, the maximization over each of the vectors p and ρ, keeping the other constant, must be unique. Lemma 1 implies this condition is met in part, but it is not clear whether the condition may be violated for the maximization of I(p, ρ, σ 2 ) over ρ for a fixed p in some parameterizations of {g l } and σ 2 . Nevertheless, the smoothness of the objective function provides some assurance that a BCD approach will converge to a local extremum.
One may encounter numerical problems when using the BCD technique to solve (8) since, in general, the evaluation of I(p, ρ, σ 2 ) requires high-dimensional numerical integration or time-consuming Monte Carlo methods. In practice, we have found that the BCD method can only be employed to optimize systems with three or four subcarriers per group; larger systems require different approaches.

B. A Lower Bound
It is possible to obtain an approximate solution to (8) by considering a lower bound on the mutual information rather than the mutual information, itself. The following proposition provides one such bound.
Proposition 4. For transmit powers ρ and SAP probabilities p, I(p, ρ, σ 2 ) satisfies the lower bound Proof: See Appendix E in the Supplemental Material. The bound given above is a result of Jensen's inequality and is, thus, not particularly tight. In fact, a slightly different application of the inequality yields a marginally tighter bound [26,Th. 2]. However, the utility in Proposition 4 is not in the accuracy of the bound, but rather in the ease with which this bound can be optimized over the SAP probabilities. These optimal probabilities are captured in the following proposition.
Suppose A is nonsingular, and let B = A −1 , with b ij denoting the element in the ith row and jth column of B. The SAP probabilities that maximize the lower bound given in (10) are given by where (x) + = max{x, 0}.
Proof: See Appendix F in the Supplemental Material. Note that the probabilities given in (12) are dependent upon the subcarrier powers. The BCD approach can be employed in a fairly straightforward manner to compute the power values by alternately computing (12) for fixed powers, then fixing these probabilities in (10) and computing the maximizing power values. Alternatively, one can, in theory, substitute (12) into (10) and compute the optimal powers directly. However, the nonlinear form of (12) can cause problems using this approach.
A condition that must be satisfied in order to invoke Proposition 5 is that A must be nonsingular. It is possible that this condition is not met, for example when only a single subcarrier in the set of K active subcarriers is allocated power. Such cases can typically be dealt with by using other results reported in this section (e.g., the asymptotic results detailed below). In general, we have found that Proposition 5 is applicable to a wide range of system configurations.

C. Closed-Form Asymptotics
It is naturally preferable to solve (8) analytically. To make progress in this direction, we apply the following strategy: first, we find the probabilities p (ρ) that maximize the mutual information for any given values of the transmit powers, i.e., the optimal probabilities are functions of the powers; then, the mutual information I(p (ρ), ρ, σ 2 ) that corresponds to the optimal probabilities found previously is maximized over the powers in ρ.
1) Probability Optimization: To obtain a closed-form expression for the optimal SAP probability distribution as a function of the powers, we first resort to a high-SNR analysis, which gives rise to the following result.
Proof: See Appendix G in the Supplemental Material. In addition to a simple, closed-form expression for the optimal SAP probability distribution at high SNR, this result also provides an upper bound on the achievable rate, as stated in the following corollary.
Proof: See Appendix H in the Supplemental Material. We now turn our attention to the low-SNR case, for which we obtain the following beautifully intuitive result, which is a somewhat discrete version of the well known waterfilling principle at low SNR.
Proposition 7. For fixed powers ρ, let i = arg max i l∈Si ln g l ρ li σ 2 + 1 , i.e., i corresponds to the group of K strongest subcarriers. Then, the index probabilities maximize the mutual information at low SNR, which satisfies the asymptotic equivalence I (ρ, σ 2 ) ∼ l∈S i ln a li σ 2 + 1 , as σ 2 → ∞.
Proof: See Appendix I in the Supplemental Material.
2) Power Optimization: Propositions 6 and 7 and Corollary 1 yield closed-form expressions for the mutual information, which depend upon the powers ρ. As a result, these expressions can be used to develop optimal power allocation rules in the high and low-SNR regimes. It turns out that the optimal rules follow our conventional understanding of power allocation in OFDM systems, as formalized in the following proposition.
Proposition 8. For high SNR (as σ 2 → 0), allocating powers for the subcarriers of each SAP according to the waterfilling strategy is optimal under power constraints for each pattern. For low SNR (as σ 2 → ∞), allocating powers according to the waterfilling strategy is optimal.
Proof: See Appendix J in the Supplemental Material. The waterfilling result for the low-SNR case is somewhat unsurprising given that Proposition 7 indicates the mutual information expression is the same as that for OFDM with only K active subcarriers. On the other hand, the optimality of waterfilling at high SNR is not immediately obvious from Corollary 1. These results lead us to a simple mutual information optimization strategy for p and ρ at high and low SNR: one should perform waterfilling power allocation for each subcarrier pattern and then compute the corresponding probabilities according to Proposition 6 or 7 and select the result that maximizes the objective.

V. ACHIEVABLE RATE OPTIMIZATION: CONSTRAINED
We now consider a more practical rate optimization problem that is effectively the same as (8) but with a nonlinear constraint on the probabilities {p i }. As discussed in Sections II and III, SAP probabilities depend on two things: (1) the full binary tree that corresponds to the bit-to-SAP mapping operation, and (2) the ordering of the SAP-to-leaf assignment.
Let P v denote the set of feasible probability vectors of length C that can be constructed by considering all nonredundant SAP-to-leaf assignments for all binary trees in T v including null assignments. For example, for C = 4, P 2 is constructed by considering all mappings of three (out of four) SAPs to two leaves on the second level and one leaf on the first level of the single tree in T 2 (cf. Fig. 2). There are 3!/2! = 3 mappings of three SAPs to the leaves, and 4 3 = 4 ways of choosing the active SAPs. The inclusion of null assignments in this way ensures we consider the case of not using some SAPs that may correspond to poor channel conditions. Under this definition of P v , the number of elements (probability vectors) in P v is where n tq denotes the number of leaves at level q in tree t. We further define the union P = ∪ C−1 v=0 P v , where P 0 consists of the C vectors with one element equal to one and the rest equal to zero.
The constrained optimization problem can now be formulated as maximize We propose two methods of solving this problem here: an enumerative approach, and a projection from the relaxation.

A. Enumerative Approach
This approach is based on the enumeration of all possible probability distributions of the SAPs, i.e., all p ∈ P. The allocated powers ρ are optimized for each probability distribution p. The pair (p, ρ) that yields the highest mutual information is the solution to the problem stated in (16).
For a given probability distribution p, the power allocation problem may not be solved analytically. In this case, one can invoke the asymptotic results stated in Proposition 8 to obtain the power values. First, the waterfilling power allocation solution would be calculated for each SAP. Then, the distribution p ∈ P that maximizes the mutual information would be chosen.

B. Projection from the Relaxation
A much more computationally efficient method of treating (16) can be developed by first considering the relaxation studied in the previous section. First, we relax the constraint p ∈ P to find a solution to (8). Any of the approaches used in Section IV can be applied. We let p denote the probability distribution computed in this step. We then project p onto the feasible vector p ∈ P and take this to be the partial solution to (16). The power allocation vector ρ is then computed to maximize the mutual information.
The projection of the relaxed solution p onto a point in the set P can be accomplished efficiently by using the Huffman coding algorithm [19]. To this end, we interpret the elements of p as source symbol probabilities, then generate a full binary tree according to the Huffman algorithm. As discussed in Section II, SAPs associated with a leaf node in the tree at level q will be transmitted with probability 1/2 q . Hence, the probabilities in p are replaced with the corresponding probabilities derived from the tree structure to yield a candidate for p .
It is important to note that this basic approach will only yield trees (and associated probability distributions) with C leaves, i.e., the algorithm maps p to P C only. To ensure we consider mappings to all points in P, we require a slightly modified approach. The full details of the complete projection algorithm are given in Algorithm 2, and an example depicting how the algorithm works is shown in Fig. 4. The function sort(·) in Algorithm 2 arranges the set of C arguments in decreasing order; unsort(·) performs the inverse mapping (again, acting on C elements). The function Huffman(·) takes a set of "source probabilities" and returns the corresponding set of depths, or path lengths from the root to a given leaf. Finally, the function dist(p 1 , p 2 ) computes the distance between the discrete probability distributions p 1 and p 2 . In the next section, we consider three distance measures: Euclidean distance, for which Algorithm 2 is heuristic. It is not guaranteed to produce the solution to (16). However, results have shown it performs very well in practical scenarios (see Section VI).

VI. NUMERICAL RESULTS
In this section, we present a numerical analysis of the methods described above. We begin with a discussion of the mutual information. We then give a brief analysis of the error rate of the systems described herein. In what follows, we define SN R := P/(N σ 2 ), which can be interpreted as the average transmit SNR per subcarrier. All mutual information results are given in units of nats, and all curves were obtained via Monte Carlo sampling when closed-form expressions were not available. For all systems, we set N = 4 and K = 2. It should be noted, however, that we also performed extensive simulations for (N, K) = (6, 4) and (N, K) = (8, 6), and observed very similar trends to the case where N = 4 and K = 2. We have included some of these results in the Supplemental Material (see Appendix K).

A. Mutual Information
We begin with a simple case. Consider a system operating in AWGN with equal channel gains (i.e., g l = 1 for all l). Proposition 3 states that the optimal SAP probability distribution in this system is uniform. Adopting this result, Fig. 5 shows the mutual information for an OFDM-IM system that uses all six SAPs (each with probability 1/6) compared to one that limits the number of utilized SAPs to four where uniform power allocation is applied. The small improvement offered by the former approach simply arises as a result of the additional SAPs that are used 4 . However, we note that a uniform SAP distribution is infeasible given a uniform binary source 5 .
To better understand the advantages that can be brought by utilizing all SAPs along with the binary-tree encoding strategy, we now analyze the case where the channel gains are defined by g l = η l−1 , ∀ l ∈ {1, . . . , N } for some η ∈ (0, 1). We assume full channel knowledge is available at the transmitter, so that SAP probabilities and power allocation can be optimized. Fig. 6a shows the mutual information for η = 0.2, and Fig. 6b gives results for η = 0.7. In both figures, the different curves represent different SAP probability assignment strategies and bounds. The first three curves exhibit the mutual information computed by using the respective analytic results. The fourth curve ("Projected p (Euclidean)") illustrates the mutual information attained by employing Algorithm 2. 6 In this case, p is computed by using the analytic form given in Proposition 6. Note that this curve represents an achievable rate for OFDM-IM systems that utilize all SAPs. For all curves other than the "Benchmark", waterfilling power allocation is employed, since this approach is optimal at high and low SNR in the relaxed setting (cf. Proposition 8). The benchmark curve relates to a standard OFDM-IM system where the four SAPs, chosen according to the lexicographic principle discussed in [16], are transmitted with equal probability and uniform In Fig. 6a, we see that the (relaxed) lower bound of Proposition 6 and the low-SNR result of Proposition 7 are similar, and that convergence to the upper bound of Corollary 1 occurs at high SNR. Moreover, the fully constrained result (where p ∈ P) denoted by the "×" markers is very close to the analytic curves corresponding to the relaxed optimization. The benchmark curve is noticeably lower than all results that offer SAP probability optimization and power allocation. This was also seen in simulations for (N, K) = (6, 4) and (N, K) = (8, 6) systems (not shown). Turning our attention to Fig. 6b, we see that the advantages offered by optimization diminish for less variable channel conditions. The optimized scenario ("Projected p (Euclidean)") still offers an advantage that saturates the upper bound from mid-to-high SNR values, but it is marginal. For this simple system (N = 4 and K = 2), the results shown in Fig. 6 point to a need to understand how frequency selectivity affects performance. To this end, Fig. 7 illustrates the mutual information as a function of η. The first, fourth, and fifth curves relate to those with the same labels shown in Fig. 6. The second curve shows the mutual information attained by using the probabilities given in Proposition 5. The third curve ("Projected p (Euclidean)") illustrates the mutual information attained by employing Algorithm 2, where p is computed using Proposition 5. Again, waterfilling is used for all systems except the benchmark, where uniform power allocation is used. The advantages offered by optimization in highly frequency selective channels are apparent in this 7 Note that the curves that employ waterfilling power allocation and SAP probability optimization include the case where only four SAPs may be chosen, which would correspond to the benchmark curve but where waterfilling is employed. Such a selection, if deemed to be optimal, would naturally arise through the SAP probability optimization procedure. As a result, the true benchmark only utilizes uniform power allocation here. example 8 . It is observed that one data point is missing for the SN R = 10 dB curves related to Proposition 5. This omission results from the fact that the matrix A in Proposition 5 is singular for the corresponding paramterization. Hence, for this point, one would choose a different method of obtaining p in the initialization step of Algorithm 2. To conclude this discussion, it is important to note that Algorithm 2 roughly achieves the same mutual information promised by the analytic lower bound. The upper bound is only tight at high SNR; hence, it is not particularly tight for most η values in this figure.

B. Block Error Rate
Apart from achievable rate, error performance is another key performance metric of a communication system. It should be apparent that a scheme that is designed to maximize the achievable rate does not necessarily optimize the error performance. Nevertheless, it is important to consider the effects that the designs detailed in Sections IV and V have on the error rate. Note that, because the lengths of the bit sequences encoded as SAPs and modulated signals are variable, the measurement of bit errors would be difficult to assess in a standardized manner. Therefore, to maintain brevity and clarity, we choose to evaluate the block-error rate (BLER) instead of bit-error rate for the [27]- [29]. Here, a block is a group of N subcarriers.
For simplicity and optimality, we adopt the maximum likelihood (ML) detection scheme at the receiver to estimate the received signal vector Y consisting of received modulated symbols and nulls on N independent subcarriers 9 . We assume that channel knowledge is available at both transmitter and the receiver. The estimated signal vectorX satisfieŝ whereẊ is a candidate transmit vector (obtained by following the model described in Section II) and G = diag{ √ g 1 e jθ1 , . . . , √ g N e jθ N } is the diagonal channel coefficient matrix. We write the BLER conditioned on the transmitted signal vector X as P BLER (X|G) = P(X = X|G). Averaging over X gives our measure of interest: P BLER (G) = E[P BLER (X|G)], which is now only dependent on the channel state captured in G. This measure is useful for evaluating performance in a slow-fading environment. It also allows us to observe how channel variations affect error performance. Following the simulation setting for the mutual information analysis, we configured the channel gains to be g l = η l−1 and let {θ l } be uniformly distributed over [0, 2π), ∀ l ∈ {1, . . . , N }. We also normalized σ 2 = 1 for the noise power and let N = 4 and K = 2 as an example. We do not apply rate adaptation in this study; consequently, we employ a uniform power allocation scheme in all simulations related to error performance. This allows us to focus on the effect that binary-tree optimization (i.e., bit-to-SAP optimization) has on performance. We numerically examined the BLER for OFDM-IM with the SAP probability distribution optimized under two conditions. The first condition only requires the number of leaves in the binary tree that defines bit-to-SAP mapping to be equal to or smaller than the number of SAPs. The second condition restricts the encoder to only consider full binary trees with C leaves, which reduces the achievable rate at low SNR. Also, the classic OFDM-IM scheme studied in [9] was adopted as a benchmark. We adopt the lexicographic codebook design for the classic OFDM-IM system to select four out of  six SAPs for comparison purposes [16]. The numerical results are presented in Fig. 8, which were obtained by collecting 10 3 block-error events for each SNR point (subject to random additive white Gaussian noise).
In Fig. 8, it is apparent that the rate-optimized OFDM-IM systems designed according to the first condition outperform those designed under the second condition. This behavior correlates with the fact that the first condition is less restrictive than the second. Perhaps more interestingly, we note that the rate-optimized system does not always outperform the classical OFDM-IM scheme. When signals are subject to deep fading, the rate-optimized system designed according to the first condition may only utilize a single SAP consisting of the best K subcarriers, and the system is reduced to OFDM with K active subcarriers. This system is capable of performing better than those that encode information in the index domain as well as signal space, since block errors arising from incorrect SAP decoding do not occur.
VII. CONCLUSIONS In this paper, we provided a thorough treatment of the rateoptimization problem for OFDM-IM systems with channel knowledge at the transmitter. We cast the problem as one of mapping bit sequences to activation patterns, which enabled us to utilize a binary tree formalism for algorithm development and analysis. To this end, we presented new results on full binary trees, both in terms of algorithmic construction and enumeration. We also reported a number of new analytic bounds and asymptotic results related to the relaxed mutual information optimization problem where SAP probabilities can take any values in the interval [0, 1] subject to a sum probability constraint. We then used the results pertaining to the relaxed problem to develop a heuristic algorithm for obtaining a feasible solution to the constrained problem.
Numerical results indicate that this solution is nearly optimum (relative to the relaxed upper bound), particularly in the low and high SNR regimes, and the optimized approach is capable of offering a rate advantage over the conventional OFDM-IM benchmark of [9].
A number of open problems remain. First, it is not clear whether an analytic form for the optimal power values exists for all SNR values; only low and high SNR results were reported here. In fact, it is not readily apparent that the mutual information is concave in the power vector ρ; hence, a general analytic form may not be forthcoming. As an alternative, it would be preferable to develop an efficient numerical approach to solving the relaxed optimization problem. The use of BCD was briefly discussed here, but further work is needed to determine whether this method would be a viable solution in practice. It is also not known whether the heuristic projection algorithm (Algorithm 2) is, in fact, optimal in some sense. Finally, and more generally, only rate-optimization for uniform sources was considered; it would be fruitful to study the BLER-minimization problem as well as nonuniform sources and non-Gaussian signalling (i.e., finite signal constellations).

Supplemental Material
Proofs of lemmata and propositions reported in the paper entitled "Binary-Tree Encoding for Uniform Binary Sources in Index Modulation Systems" are provided in the appendices below. Additional numerical results are included for completeness at the end of this document.

APPENDIX A PROOF OF PROPOSITION 1
Define the mapping w : T v−1 → T v for any integers v > 1 and ≥ 0, such that w appends the protograph τ to the left-most available node at height , relative to the deepest leaf, of a tree in T v−1 . Hence, for a tree t ∈ T v−1 with maximum depth d, w connects two edges to the left-most leaf node in level d − of t . These edges are, in turn, each connected to a leaf node at depth d − + 1. Note that w 0 always maps a tree to a new tree with one additional internal node and one additional leaf, whereas w , for > 0, will return the empty set if no height-leaves exist in the tree on which w acts. Algorithm 1 applies w 0 and w 1 to each tree in T k with every step of the for loop. The following lemma guarantees that w 0 and w 1 generate unique trees.
Lemma 2. When applied to elements of T v−1 , the mappings w 0 and w 1 yield nonisomophic trees.
Proof: The mapping w 0 is one-to-one. Hence, the image w 0 (T v−1 ) consists of T v−1 trees. None of these trees are isomorphic, since no trees in T v−1 are isomorphic. Similarly, where admissible, the mapping w 1 is one-to-one. In the inadmissible case where w 1 returns the empty set, the mapping is many-to-one; but this can be ignored, since no tree is generated. Thus, the image w 1 (T v−1 ) consists of at most T v−1 trees. Again, none of these trees are isomorphic, since no trees in T v−1 are isomorphic. Furthermore, we deduce that w 0 (T v−1 ) ∩ w 1 (T v−1 ) = {}, since every t ∈ w 0 (T v−1 ) has two leaves at the deepest level and every t ∈ w 1 (T v−1 ) has more than two leaves at the deepest level.
We now state the following lemma, which concludes the proof.
is a complete reduced set of v-node full binary trees.
Proof: Let T 1 = {τ }. It is easy to verify (cf. Fig. 2) that T 2 = w 0 (T 1 ) ∪ w 1 (T 1 ) and T 3 = w 0 (T 2 ) ∪ w 1 (T 2 ) are complete sets. Moreover, Lemma 2 ensures T v contains no isomorphic trees for v > 1. Hence, to prove that T v is a complete reduced set of full binary trees for v ≥ 4, we must show that for 2 ≤ ≤ v − 1 and v ≥ 4. Assume the lemma is true for all v = 2, . . . , k − 1. Choose t ∈ T k−1 . Consider the mapping w (t ). We treat several possibilities. If the operation maps to the empty set, the set relation is satisfied since by definition (no tree is generated). On the other hand, if w (t ) is nonempty and the deepest level of w (t ) contains exactly two leaves (and hence the same can be said for t since ≥ 2), then we must show that there exists a tree t ∈ T k−1 such that w (t ) = w 0 (t). Note that, in this case, w 0 has an inverse, and the composition w −1 0 • w commutes. Thus, we write where, in the second and third equalities, it is understood that w operates on the level at height relative to the deepest leaf in t . But, by the inductive hypothesis and Lemma 2, we have that t = w −1 as required. Now suppose w (t ) is nonempty and the deepest level of w (t ) contains more than two leaves. In this case, we must show that there exists a tree t ∈ T k−1 such that w (t ) = w 1 (t). We take a similar approach, recognizing that w 1 has an inverse, and the composition w −1 1 • w commutes. It follows that and, by induction, t = w −1 1 (t ) ∈ T k−2 . Finally, we have that as required.

APPENDIX B PROOF OF PROPOSITION 2
The proposition can be seen to hold (with equality) for v = 2, 3, 4 by explicit construction of T v . For v > 4, consider the mappings {w } given in Appendix A. As noted in the proof of Lemma 2 in that appendix, w 0 is one-to-one. Thus, |w 0 (T v−1 )| = T v−1 . Moreover, from Lemma 2 and the definition of T v given in Lemma 3, we know that The set T v−1 can be partitioned into two subsets: one set that contains trees that are mapped to v-node trees in T v under w 1 and one set does not admit a mapping under w 1 . We call trees in the first subset T o v−1 open trees and trees in the second To lower bound |T c v−1 |, we apply the following reasoning. A (v − 1)-node closed tree is formed by appending a closed subtree of size, say, r internal nodes to a subtree of size v − 1 − r. Each set T r with r = 2 q − 1 for some positive integer q has exactly one dense closed tree, i.e., a tree where every level is fully connected to the pervious and next levels, and the deepest level has 2 q leaves. Thus, for every q ∈ 2, . . . , log 2 (v − 1) , we can enumerate v − 1 − r = v − 2 q closed trees. This is a lower bound, since other combinations of r-node closed subtrees and (v − 1 − r)-node trees exist. Finally, we note that if v − 1 is one less than a power of two, T v−1 contains a dense subtree, which gives rise to the δ v parameter stated in the proposition.
APPENDIX C PROOF OF LEMMA 1 Referring to (9), the equality constraint is affine and the inequality constraints are convex. Now, consider the objective function Hence, we must prove that h(Y ) is concave in p. Let us interpret f Y (cf. (6)

APPENDIX D PROOF OF PROPOSITION 3
Starting from Lemma 1, we form the KKT conditions where λ 0 ∈ R C and ν ∈ R are the Lagrange multipliers. The gradient of the Lagrangian is and the gradient of h(Y ) with respect to p can be written as where a(y) := (f Y |U (y | U = i)) i=1,...,C . Hence, the optimal vector p must satisfy ln(a(y) T p)a(y) dy = λ 0 + (ν − 1)1.
where we have used the fact that a(y) dy = 1. Furthermore, since the problem is concave, any p, λ 0 , and ν that satisfy (33) and the rest of the conditions in (30) are primal and dual optimal, and thus yield the maximum mutual information. Choose p i = 1/C for all i. In this case, the inequality constraints are inactive, which implies λ 0 = 0. As a result, (33) is satisfied (along with the rest of the KKT conditions), and thus uniform SAP probabilities is optimal.
APPENDIX G PROOF OF PROPOSITION 6 In the following we denote a li = g l ρ li , for all l. From (29), together with (6) and (7), we have In the integral corresponding to the ith term above, we make the following change of variables: y l → √ a li + σ 2 y l and y * l → √ a li + σ 2 y * l , l ∈ S i , and y l → σy l and y * l → σy * l , l / ∈ S i , and obtain and obtain We observe that the limit of the ith term of the sum in (45) as σ 2 → 0 is p i ln pi qi (with the convention 0 · ln 0 = 0). 10 Thus, the mutual information obeys the asymptotic equivalence Note that i∈U p i ln pi qi is the Kullback-Leibler divergence between p and q, which is always positive and equals zero if and only if the two distributions are identical. Thus, at high SNR (σ 2 → 0), the probabilities that maximize the mutual information are p i = q i , i ∈ U.

APPENDIX I PROOF OF PROPOSITION 7
Starting from (43), we observe that, as σ 2 → ∞, the mutual information obeys the asymptotic equivalence The inequality is achieved when p i = r i , i ∈ U.

APPENDIX J PROOF OF PROPOSITION 8
Consider the high-SNR regime. According to Proposition 6 and Corollary 1, the optimal mutual information for given powers satisfies the asymptotic equivalence I (ρ, σ 2 ) ∼ ln i∈U l∈Si g l ρ li σ 2 + 1 , as σ 2 → 0.
We now maximize the mutual information over the powers at high SNR by formulating the optimization problem maximize ρ ln i∈U exp l∈Si ln g l ρ li σ 2 + 1 l∈Si ρ li ≤ P and ρ li ≥ 0, ∀i ∈ U, l ∈ S i .
Given that the objective function is increasing in every variable, the constraints are satisfied with equality. Moreover, the problem actually decouples in N K separate problems. We cast the ith problem as maximize ρ l∈Si ln g l ρ li σ 2 + 1 l∈Si ρ li ≤ P and ρ li ≥ 0.
For each i ∈ U, the optimal powers are found via the waterfilling strategy. Now consider the low-SNR regime. According to Proposition 7, we optimize I (ρ, σ 2 ) ∼ l∈S i ln a li σ 2 + 1 under the constraint l∈S i ρ li = P . The optimum is given by waterfilling power allocation.

APPENDIX K ADDITIONAL NUMERICAL RESULTS
Here, we include additional numerical results obtained for the benefit of illustration. Figs. 9 and 10 relate to systems with N = 6 and K = 4. In Fig. 9a, a clear and fairly constant benefit can be observed in highly frequency-selective channels for the optimized system compared to the benchmark OFDM-IM system, which utilizes eight SAPs out of fifteen. The advantage is severly diminished when the channel is less selective (cf. Fig. 9b). This behavior is observed more clearly in Fig. 10. Note that, in this figure, results related to the use of the probability computation detailed in Proposition 5 are not available for low values of η, because the matrix A is singular for these parameterizations. One could use a numerical approach or one of the other analytic results detailed in Section IV to obtain the relaxed SAP probability distribution in this case. gives an achievable rate when all SAPs are allowed to be utilized and power allocation is employed.
Figs. 11 and 12 relate to systems with N = 8 and K = 6. In this scenario, the benchmark scheme utilizes sixteen out of twenty-eight SAPs. As a result, the benefit offered by the optimized technique is more clearly observed for η = 0.2. For the less frequency-selective case (Fig. 11b), the gains, again, diminish. In Fig. 12, we see that the utility of Proposition 5 is restricted to reasonably high η values (less frequency-selective channels). gives an achievable rate when all SAPs are allowed to be utilized and power allocation is employed.