Permutation Matrix Modulation

We propose a novel scheme that allows MIMO system to modulate a set of permutation matrices to send more information bits, extending our initial work on the topic. This system is called Permutation Matrix Modulation (PMM). The basic idea is to employ a permutation matrix as a precoder and treat it as a modulated symbol. We continue the evolution of index modulation in MIMO by adopting all-antenna activation and obtaining a set of unique symbols from altering the positions of the antenna transmit power. We provide the analysis of the achievable rate of PMM under Gaussian Mixture Model (GMM) distribution and finite cardinality input (FCI). Numerical results are evaluated by comparing PMM with the other existing systems. We also present a way to attain the optimal achievable rate of PMM by solving a maximization problem via interior-point method. A low complexity detection scheme based on zero-forcing (ZF) is proposed, and maximum likelihood (ML) detection is discussed. We demonstrate the trade-off between simulation of the symbol error rate (SER) and the computational complexity where ZF performs worse in the SER simulation but requires much less computational complexity than ML.


I. INTRODUCTION
T HE research on finding alternative symbols to send more information bits without utilizing expensive conventional resources (i.e., time and frequency) is a growing topic in wireless communication.Many works have been done to establish the foundation of this area known as Spatial Modulation (SM) and Generalised Spatial Modulation (GSM) in [1]- [4] as well as its latest development called Quadrature Spatial Modulation (QSM) in [5]- [7].The main idea of these systems is to exploit the potential "modulate-able" symbols that are coming from the spatial characteristics of multiple input multiple output (MIMO).What makes them interesting is that they arguably can be integrated into the existing technologies (i.e., 4G and 5G) without requiring any major changes in the hardware, unlike most of the other proposed systems, which often demand a total change from the existing technologies.Furthermore, this area can also be attractive for the incoming 6G technology to be adopted.
A new system called Permutation Channel Modulation (PCM) has been proposed in [8] which offers a new paradigm to exploit index modulation in MIMO.PCM permutes the position of singular values obtained from decomposing the MIMO channel by assuming that channel state information is available at both transmitter (CSIT) and receiver (CSIR).The position of the singular values is rearranged using a permutation matrix and treat the permutation matrix as a symbol to send more information bits.This idea is a breakthrough since the number of maximum transmit bits 1 substantially increases compared to the previous systems (i.e., SM, GSM, and QSM).It can be seen in Fig. 1 where we plot the number of maximum transmit bits versus the number of transmit antennas.Fig. 1 is computed based on the maximum transmit bits of each system provided in Table I.Note that Fig. 1 does not necessarily mean the rate, but it can be interpreted as the potential rate of each corresponding system in providing the "modulate-able" symbols/indices.

A. Motivation and Previous Works
There are two main benefits when employing SM over MIMO, as mentioned in [2], [3].First, SM eliminates the interchannel interference (ICI) due to single antenna activation.Second, the radio frequency (RF) chain can be reduced, and thus, tight antenna synchronization can be avoided.A simplification of SM called Space Shift Keying (SSK) was proposed in [12], [13].Unlike SM, SSK obtains its symbols only from the combination of single antenna activation.However, the rate performance of these two systems is insignificant.A generalization of SM and SSK was introduced in [4] and [14], respectively.The generalization is obtained by allowing more than single antenna activation at each time.This is proposed to improve the system's achievable rate.However, besides losing the two main benefits of the conventional SM and GSK, these generalizations cannot significantly improve the achievable rate performance.In [15], the optimal power allocation of SM was presented.An optimization problem was formulated by deriving the mutual information of the transmit and receive signal under Gaussian distribution.However, it is unclear if we can do so due to the presence of the modulated index, which clearly does not follow Gaussian distribution.
In [9] and [10], space time shift keying (STSK) which is a matrix modulation based was proposed.The idea is to 1 Let us clarify that maximum transmit bits is the maximum number of bits a transmitter can radiate per transmission.It is obviously not the same with the terms "throughput" as [9] and [10] used.It is also not "spectral efficiency" as [11] used.We understand that the general definitions of data rate, throughput and spectral efficiency are the performance metrics that are measured by defining the mutual information of the transmit signal and the receive signal.To be more clear, when we employ 64-QAM, it does not mean we have 6 bits/s throughput or 6 bits/Hz spectral efficiency.It just means that one constellation point represents 6 bits, and thus, we can transmit 6 bits for every symbol we radiate.We believe clarifying this issue will avoid confusions to the readers.Fig. 1: Comparison between our proposed system with the existing index modulation systems in the number of maximum transmit bits employ certain number of dispersion matrices to be modulated to represent a block of bits.The matrices are modulated along with the constellation points such that STSK has two sources of information.Although our work seems similar to STSK system, there are major differences in fact.The transmit signal of STSK is a multidimensional signal which requires multiple time instants to complete the transmission of a modulated dispersion matrix.It leads to a rate degradation since the obtained rate must be divided with the number of time instant that is used to complete transmitting one dispersion matrix.On the other hand, PMM's transmit signal is a one dimensional signal which just requires a single time instant to accommodate the transmission of a permutation matrix.In other words, PMM uses the permutation matrix to rearrange the position of power allocation coefficients from its default positions at the transmitter while STSK must receive the dispersion matrix as a full matrix.This main difference leads to distinctive performance measures.Another similar system to our work is differential spatial modulation system (DSM) in [16] which employs permutation matrices to represent a block of bits.However, DSM is a special case of the STSK system where in DSM system permutation matrices are employed, instead of dispersion matrices.The major drawback of DSM besides it requires multiple time instants (DSM also has multidimensional transmit signal) for the signal transmission is that DSM only allows one degree of freedom since only one antenna is activated at each transmission.It is different from our work where we can utilize full degree of freedom of the MIMO system.These differences lead to significant performance differences as we show in the next sections.Moreover, the easiest way to understand the fundamental distinction of PMM with STSK and DSM is to observe the number of transmit bits in Fig. 1 and Table I.

𝑁 𝑁 𝑎
QSM [5]  QSM = log 2 () + log 2 ( 2 ) STSK [9]  STSK = (log 2 () + log 2 ())/ DSM [16]  DSM = log 2 () + ( log 2 (!) )/ have more "modulate-able" symbols through antenna activation.Although some benefits are offered, as mentioned above, we argue that this method is ineffective because the potential rate from the antennas which are being inactivated is ignored.Another problem arises when we discuss the complexity of the detection scheme of these systems.Most of the authors consider maximum likelihood (ML) detection to retrieve information bits from the receive signal, such as in [5], [17].The complexity of ML always grows exponentially with the number of inputs.For example, in this case, the complexity grows exponentially with the number of transmit antennas.
PCM system, which utilizes all-antenna activation, was originally proposed in [8].The basic system model and capacity analysis of PCM were discussed.A detection scheme was also proposed, and the bit error rate was simulated to evaluate the performance.The capacity was derived under the assumption of Gaussian as the input distribution.However, there is no discussion if the transmit signal can be modeled as Gaussian distribution, and furthermore, the transmit power is unconstrained at the transmitter.The system can also only work if both CSIT and CSIR are available, implying that PCM may not be competitive in practice.

B. Contribution
The main idea of this paper is to propose a new structure of PCM different from [8].We call this new structure Permutation Matrix Modulation (PMM) instead of PCM.We provide a rigorous analysis that shows the benefits of our proposed system and how it can be more competitive compared to the others.The fundamental idea of PMM and PCM is the same, where we modulate a set of permutation matrices in addition to the conventional constellation symbols.PCM works by multiplying a permutation matrix to the singular values matrix obtained from decomposing the MIMO channel such that we can detect the altered positions of the singular values at the receiver.PMM uses a deterministic power allocation matrix known by both the transmitter and the receiver.The product of the power allocation matrix with a permutation matrix is now used, and the altered positions of the power allocations at the receiver are detected accordingly.Both systems treat a set of permutation matrices as additional symbols.PMM is also more general than PCM since PMM does not depend on the availability of CSIT because we do not need to decompose the MIMO channel.This is the reason why PMM is a more suitable name.Unlike PCM that assumes complex Gaussian as the distribution of the transmit signal, we derive an achievable rate under a Gaussian Mixture Model (GMM) distribution.We argue that GMM is an appropriate distribution for transmit signal due to the existence of the permutation matrices.We prove this mathematically in the next section.
The achievable rate derivation of SM and GSM systems under GMM distribution is available in [1].It was obtained by finding the upper bound of the differential entropy of a GMM distribution.This upper bound is then tightened using an upper bound refinement algorithm.In this paper, we exactly follow the same methodology as [1] for deriving the achievable rate.However, we show how the upper bound refinement algorithm works and mathematically analyze why the algorithm is valid for our particular system.This discussion is missing in [1].In tightening the upper bound, we show that our case is a special case of Salmond's clustering where we can choose any order to merge between two Gaussian mixture components from the GMM distribution.The original analysis of the upper bound refinement algorithm can be found in [21] and the distance measure of two Gaussian mixture components can be found in [22], [23].Besides discussing the achievable rate obtained by having GMM distribution for the transmit signal, we also evaluate the rate of PMM through finite cardinality input (FCI) and compare the performance result with the existing systems.By using FCI method, we can analyze at what SNR our proposed system achieves its maximum transmit bits.This approach is not found in the aforementioned literature.Furthermore, we also present an optimization problem to find the optimal power allocation to attain the optimal achievable rate of PMM.We show that the optimization problem can be solved using the well-known interior point-method.
For the performance evaluation, we compare the achievable rate of PMM with the existing systems such as SM and GSM.The result indicates that PMM outperforms SM and GSM with the same parameters.We also present the comparison between the achievable rate of PMM with optimized power allocations and generic power allocations.We show that the optimized power allocations can improve the achievable rate of PMM.Furthermore, we propose a detection scheme for PMM based on zero-forcing (ZF).We then simulate and compare the symbol error rate (SER) between ZF detection and the maximum likelihood (ML) detection.We also analyze the complexity of both detection schemes to observe the benefits and drawbacks.In short, ZF performs worse in the SER simulation but requires much less complexity compared to ML.This analysis is important from a practical point of view.
To sum up, we highlight our major contributions in this paper as follows: • We continue the early work of [8] by providing the system achievable rate performance under the GMM distribution.We also show that our system in this paper can work in the absence of CSIT, unlike in [8].
• We analyze the rate of PMM using FCI method and compare the results with the existing systems by setting the same number of maximum transmit bits.• An optimization problem attaining the optimal achievable rate of PMM is discussed and evaluated.We show that despite the absence of CSIT, PMM has a close achievable rate performance to the optimal scenario.• We demonstrate that the achievable rate of our proposed system outperforms the achievable rate of SM and GSM, and has a very close performance to MIMO V-BLAST.
From this performance analysis, we show that full antenna activation can achieve better performance compared to partial antenna activation systems such as SM and GSM.• We provide the SER comparison between PMM, GSM, MIMO and SM using ML detection scheme to transmit 16 bits per transmission.In order to achieve the same maximum transmit bits, our proposed system requires the least constellation size or number of transmit antennas.We believe that it is of the practical interest when we can achieve high performance using the least hardware requirements.
• A new ZF-based detection for PMM is proposed, and a trade-off analysis with ML detection is given by providing the SER and complexity of each detection scheme.We show that in terms of the SER performance, ML is better than ZF.However, the complexity analysis suggests that ZF requires far less computation compared to ML.

C. Paper Outline
The rest of this paper is organized as follows.Section II defines the system model of PMM.In Section III, we provide an analysis of the achievable rate of PMM under GMM distribution.An optimization to attain the optimal achievable rate of PMM is presented in Section IV.Section V provides detection schemes for PMM.Simulation results on the achievable rate and SER are presented in Section VI.In Section VII, the complexity analysis of the detection schemes is discussed.Finally, the major conclusions and implications are drawn in Section VIII.
Notation: In the following, uppercase bold letters A denote matrices and lowercase bold letters a denote column vectors.The superscripts (.) T and (.) H denote transpose and conjugatetranspose, respectively.We use tr(A) and |A| for sum diagonal and the determinant of matrix A, respectively, .for Frobenius norm and E{.} for the expected value.
Reproducible research: Our simulation results can be reproduced using the Matlab code and data files available at: https://github.com/faddlis/Permutation-Matrix-Modulation.git

II. SYSTEM MODEL
We investigate a new structure in index modulation schemes called PCM initially presented in [8].Our proposed system is shown in Fig. 2. Consider a point-to-point MIMO system equipped with  transmit antennas and  receive antennas.The incoming information bits are divided into two blocks where the first block of length  =  log 2 () bits is modulated into -ary constellation symbols and the second block of length  = log 2 (!) bits is modulated into a permutation matrix; therefore we have a total of  +  bits as the maximum transmit bits of PMM.The modulated symbols can be represented in a vector form as s = ( 1 , . . .,   ) T where   ∼ CN (0, 1) ∀.Element in the vector s indicates the position of the modulated symbol corresponding to its transmit antenna.On the other hand, the second block is used to select a permutation matrix from set P = {P 1 , . . ., P  } where  = 2  is the total number of possible permutation matrices.We can see that number of the possible permutation matrix is bound to the number of transmit antennas.An example of mapping information bits to permutation matrices for  = 3 is given in TABLE II.Notice that there are actually in total six unique permutation matrices for  = 3.However, we could only use four of them for binary transmission.This is also why the floor function is used to define the number of bits in the second block .We form the transmit precoded signal x ∈ C  ×1 to convey the information bits as where P ∈ P is the modulated permutation matrix,  = diag( √  1 , . . ., √   ) is a diagonal matrix consists of the power allocated to the corresponding transmit antenna that satisfies 0 ≤   ≤  ∀,  1 ≠ . . .≠   and  =1   =  where  is the total transmit power.
Corollary 1: The precoded signal in (1) satisfies the following tr Proof : We know that E{ss H } = I  where I  is identity matrix of length ,  H =  since  is diagonal matrix and P H  = P −1  ∀ since P  's are unitary matrices 2 .Thus, matrix P from left-hand and right-hand sides permutes the squared elements of matrix  and summation of all squared elements of matrix  is equal to .
Corollary 1 will later be useful in the analysis of the following sections.Notice that the precoded signal (1) is the main difference of our proposed system in this paper with the 2 all permutation matrices are unitary matrix.
PCM system proposed in [8] where instead of utilizing the singular value matrix obtained from singular value decomposition (SVD) operation of the MIMO channel, we use matrix .Therefore, we can assume that our system can work in the absence of CSIT, unlike in PCM.Vector x is then sent through the MIMO channels H ∈ C  × and received as y ∈ C  ×1 at the receiver as where n ∼ CN (0, I  ) is the resulting noise vector.Let ℎ   represents the flat-fading channel coefficient inside the channel matrix H drawn from Rayleigh distribution where  and  are the row and column indices, respectively.The entries n and H are independent and identically distributed (i.i.d) with zero mean and unit variance.Theorem 1: The capacity of PMM under complex Gaussian distribution is where Γ =  H . Proof : See APPENDIX A. The capacity of PMM shown in ( 4) is derived under the assumption that we can model the input vector x as Gaussian variate (e.g., P and x are Gaussian).However, it is not clear if we can do so 3 , especially when we have finite .Therefore, in order to obtain a meaningful performance analysis of our system, we will present the achievable rate by taking the assumption that vector input x is not Gaussian variate in the following section.

III. PERFORMANCE ANALYSIS OF PMM
In this section, we present the achievable rate of our proposed system.We investigate the achievable rate by first showing that the precoded vector x and the receive vector y follow GMM distribution.We then evaluate the mutual information of x and y, and hence, obtain the achievable rate of PMM.

A. Gaussian Mixture Model (GMM) Distribution
We first show that every permutation matrix in the set P results in a unique covariance matrix.In the case of finite , 3 if matrix  cannot be made Gaussian distributed, x is not Gaussian variate.
covariance matrix of the transmit vector x given that P = P  is for  = 1, . . ., .Let C = {C 1 , ..., C  } be the set of covariance matrices of each permutation matrix from set P. Therefore, we know that the permutation matrix set has a one-to-one correspondence with the set of covariance matrices C. Note that the covariance matrices are diagonal matrices of length  with the same diagonal elements but in different positions.For example, the covariance matrices for  = 3 and the permutation matrices given in TABLE II are shown by We can see that the covariance matrices are all unique and thus, distinguishable 4 if and only if  1 ≠ . . .≠   .
Let  ()   x be the conditional probability distribution function (pdf) of the precoded signal x given that P = P  .Since the modulated symbol vector s is complex Gaussian, x is also complex Gaussian given a certain permutation matrix with zero mean and variance C  .We can formalize the expression as On the other hand, we know that selection of the permutation matrix P  is based on the incoming bitstream; therefore, each permutation matrix in the set P has a certain probability.Mathematically, we can write the probability mass function (pmf) of the random matrix P as where 0 ≤   ≤ 1 and  =1   = 1.Theorem 2: If the modulated symbols in vector s are complex Gaussian with zero mean and unit variance, and each permutation matrix has pmf as shown by (7), the precoded signal vector x follows a complex GMM distribution with pdf We can now evaluate the covariance of random vector x with pdf  x .The covariance of x will be the same as its autocorrelation matrix since the precoded vector x has zero mean.The covariance of  x is the expectation conditioned on the permutation matrix P, and can be written as where the conditional expectation is given by E xx H | P = P  = C  , and thus, we have From Theorem 2, we know that the precoded signal x follows the GMM distribution.It is straightforward to show that the output vector y also follows the GMM distribution.This is due to the fact that receive vector y given a particular permutation matrix P  has the form as follows y  = HP  s + n. (11) Therefore, the conditional distribution of y follows a complex Gaussian distribution with zero mean and variance D  can be shown as where Let D = {D 1 , . . ., D  } be the set of covariance matrices of y, using the same methodology as in Theorem 2, we can find the pdf of y to be and the covariance of  y is

B. Achievable Rate of PMM
The mutual information of the precoded signal x and the receive vector y can be evaluated to derive the achievable rate of PMM.The mutual information is shown as follows The second term in ( 16) is equal to differential entropy of the noise vector n since it is the only remaining random variable left when x and H are given.The expression can be written as On the other hand, the first term in ( 16) can be evaluated as follows We can then see that the mutual information in ( 16) depends on GMM distribution.Furthermore, the rate of PMM can be written as follows Unfortunately, there is no closed-form expression for the differential entropy of a vector with a GMM pdf.This is due to the logarithm of the sum of exponential that cannot be simplified.However, we can approximate the sum logarithm in (18) using the multivariate Taylor-series expansion as presented in [21].
Having the pdf of random vector y in ( 14), the differential entropy of  (y | H) can be approximated as follows where  is the number of considered terms from the Taylorseries expansion and ∇ y is the gradient with respect to random variable y.The approximation is done by setting  as a finite number to obtain a finite approximation, and the remaining terms are truncated.However, it is not clear if we can obtain the deviation of the approximated entropy in (20) from its actual value as mentioned in [21].Therefore, finding the numerical result of (20) may not be reflecting the performance of our proposed system 5 .
Another approach to obtain a meaningful performance analysis of our proposed system is to find the upper bound of rate .It can be done by finding the upper bound of a GMM random vector.
Theorem 3: The achievable rate of PMM is 5 good approximations are usually measured from how much the approximated values are shifted at most from its actual values; therefore, derivation of the approximation deviation is required to justify if our approximation is good enough or not.
Proof : Since the receive vector y follows a complex GMM distribution, the achievable rate of PMM follows the upper bound of differential entropy of a complex GMM distribution as shown in [1].
The expression of the achievable rate shown in Theorem 3 tells us that there exist decoding scenarios in which we can obtain the rate  PMM at most.Thus, the numerical results of ( 21) can be used to analyze the performance of our proposed system.

C. Upper Bound Refinement
Note that the upper bound expression ( 21) is not tight.However, we can tighten the upper bounds by employing the upper bound refinement algorithm as discussed in [21].The idea of this algorithm is to merge the mixture components in pairs until a single Gaussian distribution represents all the mixture components.A family of upper bounds6 will be resulted in, and the lowest value given in the family of upper bound is tighter upper bound.The complete proof that this algorithm produces tighter upper bounds is given in [21].Adapting to our case, the upper bound refinement algorithm is provided in Algorithm 1.
The problem of merging between two Gaussian mixture components shown in line 3 Algorithm 1 is one that we need to clarify.Suppose we have a mixture of  Gaussian components whose pdf is given by (14).We can merge between its two components (e.g., Merge(  ()  y ,  ( ) y )) becomes a single Gaussian, as presented in [22], with merged weight, mean and covariance 7 , are respectively given by where y , respectively.In our case, we have μ     = 0 ∀,  and C  =   |  C  +   |  C  since we have zero mean for all our mixture components.Therefore, merging two mixture components only gives us new weight and covariance which can be computed using (22) and (24), respectively.Now, we can think in what order we should merge the mixture components.One option is to use Salmond's clustering discussed in [23] and [22] where we can measure the distance between  () y and  ( ) y as follows and C is the overall covariance.Having the distance (25), we must compare all the distance of the mixture components and choose two mixture components with the smallest distance to be merged first.
Corollary 2: We can choose arbitrary order to merge the Gaussian mixture components of  y to obtain tighter upper bounds (21).
Proof : Since we have zero mean for all the Gaussian mixture components, we have ΔW   = 0 ∀, , therefore, we always have  2 s (, ) = 0 ∀, .In other words, merging any of two components of  y always results in zero distance which means any merging order will always result in the same tighter upper bound.
We know from Corollary 2 that our "Merge" is a special case of Salmond's clustering.Thus, we can merge the mixture components, for example, in the order shown by Algorithm 1.
Remark 1: Salmond's clustering is an approximation method to find the distance of Gaussian mixture components.There are several other methods, such as Kullback-Leibler discrimination and Runnall's reduction method in [22].It is important to note that all these methods do not guarantee to produce the tightest upper bound.We chose Salmond's clustering because it requires cheap computation and has a reasonable trade-off with precision.

D. Achievable Rate via Finite-Cardinality Input (FCI)
In the previous section, we have shown the achievable rate expression of our proposed system by considering GMM distribution.It implies that the achievable rate shown in (21) was evaluated using continuous distributions as the input.Another method that we can also use to observe the achievable rate of PMM is to consider finite input distribution.In [24], this method is called discrete-input continuous-output memoryless channel (DCMC) and in [25], it is called the cut-off rate.The idea is to employ a finite set of codewords and define the probability of each codeword being transmitted.Using this method, the achievable rate will be restricted on the maximum transmit bits as shown in TABLE I.It means that in our case, we include number of antenna and constellation size as factors in evaluating the achievable rate via FCI.
In evaluating the achievable rate via FCI, we first define the conditional probability of receiving the transmitted signal (1) given that P ∈ P and s = ( 1 , . . .,   ) T where   ∈ Q and Q is the set containing all -ary constellation symbols.For simplicity, let us then define x  for  = 1, . . .,  as the th transmit signal from the total of  =   possible transmit signal combinations.Thus, the achievable rate of PMM can be evaluated by [25,Ch. 6.8] To solve (27), we can use the inequalities where we can solve the expectation in (28b) by finding the average power gain assuming i.i.d Rayleigh fading of each element in the channel H as shown in [25,Ch. 14.4], so that we have Using the fact that ( 27) is maximized when we have a uniform distribution on the input codewords (e.g., (x  ) = 1  ∀) and by substituting (28) and ( 29) to (27), we have where SNR =   0 is the signal to noise power ratio.

IV. OPTIMAL ACHIEVABLE RATE WITH CSIT
Suppose we know the channel matrix H at the transmitter, we can have H = UV H by decomposing H using the wellknown singular value decomposition (SVD) where U and V are both unitary matrices, and  = diag( 1 , . . .,   ) is a diagonal matrix containing the singular values of H.We can reform the precoded signal as and obtain the receive signal as Finally, we obtain a parallel channel by multiplying singular matrix U to y csit as shown by where ñ = U H n = ( ñ1 , . . ., ñ ) T and  ∈ {1, . . .,  }.Note that the noise statistics in (33) is preserved since unitary matrix U only rotates the noise vector.We write the index () to show that the position of   may be different from its original position at the transmitter.This is due to the multiplication of the permutation matrix P may alter the position of   when received at the receiver.So, the index () is there to keep the original position of   tracked at the receiver.Following the same analysis given in Section III, the achievable rate can be written as We can attain the optimal achievable rate  * csit by finding the optimal power allocation  * by assuming that we know the moment-preserving merge that gives the tight upper bound.It is sensible to assume that   follows uniform distribution,   = 1  ∀, since typically we do not have much control over the incoming bitstreams.Let C tight and  tight be the covariance and weight of the merged Gaussian mixture components that give the tight upper bound, respectively.The optimization to obtain  * tight can be formulated as follows maximize where  and  are the merged and unmerged mixture components, respectively.For example, we obtain the tight upper bound of  csit by merging the mixture components from  (37) Corollary 3: The optimization problem in (37) can be efficiently solved using interior-point methods.
Proof : By noting that the constant   ,   ∀  are nonnegative, all terms inside the determinant function are nonnegative.Therefore, the objective function  tight is a strictly concave function within the feasible solution and twice differentiable.Furthermore, all constraints are linear, and the solution is always feasible.This directly follows the standard form of interior-point methods with inequality and equality constrained minimization problems in [26,Ch. 11].

V. DETECTION SCHEMES FOR PMM
By setting  1 ≠ . . .≠   , we have shown that the permutation matrices from the set P are distinguishable, and therefore, can be detected at the receiver.The problem in this section is to design detection schemes (to detect random matrix P and modulated symbol s) that can work for PMM and analyze its performance.

A. Maximum Likelihood (ML) Detection
One option is to perform ML detection, where we examine all possible combinations and find the best combination among the others.Using ML detection for PMM, we can formulate the problem as follows where P and ŝ = ( ŝ1 , . . ., ŝ ) T are the detected permutation matrix and modulated symbols at the receiver, respectively, and Q is the set containing all -ary constellation symbols.We can view the problem in (38) as finding the best combination from set P and Q that minimizes the cost function .
Let  =   be the number of all possible combinations from input P and ŝ to form the transmitted signal x  , we can simplify the cost function  in the problem (38) given that x  was sent to be where ê is the -th combination of input P and ŝ.Thus, when  = , we have ê = 0. We know that successful detection is attained when We can write the probability of successful detection as follows And finally, we have the probability of incorrect detection as shown by Remark 2: In the extreme case, when n = 0, we can achieve 100% correct detection (e.g.,  c = 1).This means that the expression in (40) tells us that we can maximize the probability of successful detection by minimizing n .One way to achieve this is to maximize the receive SNR.We will show this later in the simulation results section.

B. Zero-Forcing (ZF) Detection
The detection scheme using ML is based on exhaustive search, which leads to high complexity 8 .Using ZF, we can design a detection scheme to retrieve the information conveyed by both s and P. The basic idea of ZF is to remove the interstream interference.It is done by projecting the received signal y onto the subspace orthogonal to the one spanned by the vector columns of H. Initially, the projection is performed at each stream.However, there is a well-known explicit formula where we can decorrelate all streams at once by forming the pseudoinverse of channel matrix H [27], the pseudoinverse matrix can be defined as follows where in the special case when  = , We then perform a post-precessing by multiplying the ZF matrix with the received signal, as shown by Note that applying ZF as the post-processing results in a colored noise H ZF n.In other words, the statistic of the noise n is changed due to the application of ZF.We can detect the permutation matrix using the following expression The cost function ℎ can be rewritten to be where     is the error resulting from multiplying the -th permutation matrix from set P. Let  be the correct permutation 8 complexity here means the computational complexity.
Algorithm 2: ZF Detection Scheme Algorithm for PMM Input: MIMO channel matrix H and receive signal y 1 Generate the pseudoinverse matrix H ZF using (43) 2 Multiply H ZF with the receive signal y to obtain ỹ 3 Detect the permutation matrix using (45) 4 Multiply the detected permutation matrix P with ỹ 5 Detect the transmit constellation symbols using (50) Output: P,  1 , . . .,   matrix index.Therefore, we have the correct detection of the permutation matrix when The probability of correct detection of the permutation matrix can be written as After detecting the permutation matrix, we then multiply inverse of the detected permutation matrix with ỹ as shown by ỹ P = PH ỹ, P ∈ P.
Suppose P is correct, the equation ( 49) is purposed to restore the constellation symbols into their original positions.Finally, we can retrieve the information bits from (49) using where ỹ is the -th element of vector ỹ P. When we assume that the permutation matrix is correctly detected, we can write the probability of correct detection of the symbol at the -th position as where h ()  ZF is the -th row of matrix H ZF .We can have the probability of overall correct detection as follows And finally, the probability of incorrect detection can be written as follows We must mention that there are certain limitations when we use ZF detection.The first explicit limitation is that  ≤ , because the inter-stream interference removal is successful if the -th column vector of H is not a linear combination of the other column vectors of H.In other words, if there are more streams than the dimension of the received signal (e.g.,  > ), the ZF detection will not be successful.The second limitation is that | 1 | = . . .= |  |.This is due  Therefore, our only option is to satisfy the second constraint since the first one cannot be satisfied in order to fulfill the distinguishable constraint of the permutation matrix.We can satisfy the second one by for example employing the phase shift keying (PSK) modulation.
ZF detection scheme for PMM is our proposed detection scheme to reduce the complexity resulting in ML detection.This makes ZF detection becomes one of our contributions in this paper.The ZF detection scheme for PMM is summarized in Algorithm 2.
Remark 3: The fundamental difference between the detection using ML and ZF is that ML is a joint detection scheme while ZF detects the symbols step-by-step, where the detection of constellation symbols depends on the correctness of the permutation matrix detection.This can be easily verified from (41) and (52).Note that the probability of error shown by ( 42) and ( 53) are the probability where the permutation matrix and all constellation symbols are incorrectly detected.This is not generally true since, in some cases, we could still have several symbols correctly detected even though some symbols are incorrectly detected.

VI. SIMULATIONS
We provide the simulation results of the achievable rate and symbol error rate (SER) to support our analysis presented in the previous sections.In general, we employ Rayleigh flat-fading channel and AWGN with zero means and unit variances for all simulations.We also assume that perfect CSIR is known for all simulations.All the achievable rates are evaluated using the upper bound refinement algorithm presented in Algorithm 1 and simulated under the GMM as the input distribution.Throughout this simulation, we use TABLE III for the power allocation of PMM (unless specified otherwise).This power allocation is generic by only considering that we must set different power allocations at each antenna for PMM.GSM employs equal power allocation, 1   act  where  act is the number of activated antennas at each transmission.Furthermore, SM employs maximum power for the activated antenna.Other related parameters will be specified at each simulation.
We compare our proposed system with two existing index modulation systems, SM and GSM.The analysis about the existing systems is discussed in [1].From the achievable rate perspective, the main difference between the existing systems Fig. 3: Achievable rate of PMM, SM and GSM with different antenna settings with our proposed system is the modulated indices 9 .In short, SM modulates a single antenna as additional symbols to convey information bits.Thus, one antenna is activated at each transmission, and the combination of different antenna activation is the symbol used to send more information bits.While GSM allows any number of antennas to be activated, GSM obtains its additional symbols from combining those multiple antennas activation.As presented in [1], SM's achievable rate under GMM distribution can be computed as follows where  and C SM  are the total number of SM's modulated indices and the covariance of SM when employing the -th modulated index, respectively.And for the achievable rate for GSM is where  and C GSM  are the total number of GSM's modulated indices and the covariance of GSM when employing the -th modulated index.
See that (54) and (55) are similar to the achievable rate of PMM shown in (21).The only difference is the covariance matrices of each system.For SM, since it only allows one antenna activation at each time, each covariance matrix contains all zeros except one diagonal element containing maximum power, which indicates the antenna that is being activated.For GSM, the covariance matrices are also all zeros except the diagonal elements that indicate which antennas combination

A. Achievable rate
The achievable rates of PMM, SM and GSM are presented in Fig. 3 with 4×4 and 6×6 antenna settings.We can see that PMM outperforms both SM and GSM over the whole SNR range.Increasing the antenna setting from 4 × 4 to 6 × 6 improves in average 49.4% of the rate for PMM, 50.5% for GSM and 15.3% for SM.This reflects that PMM and GSM gain more modulated indices than SM when the antenna setting increases.In fact, SM does not obtain any additional modulated indices from increasing the antenna setting in Fig. 3 since SM is bounded to 2  in order to gain usable modulated indices for binary transmission.In this case, SM merely obtains its improvement from the spatial multiplexing of the antennas.Note that there is no clear relation between increasing the antenna setting with the average percentage improvement.We showed those percentages to give the ideas of how the compared systems behave.
In Fig. 4, we plot the comparison between the capacity of MIMO V-BLAST, the capacity of PMM computed using (4), and the achievable rate of PMM, GSM, and SM under GMM distribution.The capacity of MIMO V-BLAST with Gaussian input distribution is well-known and available in many works such as in [28].The capacity of MIMO V-BLAST with equal power allocation can be computed as follows Over the whole SNR range, the achievable rate of PMM has the closest performance to the capacity of MIMO V-BLAST with only 0.01 bits/Hz gap on average.From the small box, we can also observe that the achievable rate of PMM under GMM distribution is better than its capacity.To understand Fig. 5: Comparisons of PMM CSIR, PMM optimal and MIMO waterfilling these results, we must see that the overall covariance of PMM is close to equal power allocation when it is evaluated using the upper bound refinement algorithm 10 which also the reason why it is very close to the capacity of MIMO V-BLAST.We can also observe that the gap between the achievable rate of PMM and GSM is closer when employing a lesser antenna.It means that PMM requires a high degree of freedom to increase its rate, just like MIMO V-BLAST.An interesting result appears when we employ MISO.All systems are almost overlapped with tiny performance differences.When we reduce the number of receive antennas to one, all systems lose their ability to exploit their spatial multiplexing gain.It is also important to note Fig. 4 implies that there exists at least a coding scheme for  0 <  PMM , where  0 is the desired rate.Therefore, for the area within  SM <  0 <  PMM and  GSM <  0 <  PMM , a reliable communication cannot be attained by SM and GSM, respectively.We show the comparison between generic power allocation from TABLE III, optimal power allocation and MIMO waterfilling in Fig. 5.To attain the optimal power allocation  *  ∀ , we assume that CSIT and CSIR are perfectly known so that we obtain a parallel channel as shown in (33).We also perform the interior-point method to solve the optimization problem (37).We can observe that the optimal achievable rate outperforms the generic power allocation over the whole SNR range.In average, the achievable rate is improved by 0.09 bits/Hz for 3 × 3, and 0.13 bits/Hz for 4 × 4.This minor improvement means that we could obtain very similar performance to the optimal achievable rate despite the absence of CSIT.Furthermore, our proposed system's performance is closely approach the performance of MIMO waterfilling despite the absence of CSIT.We can Fig.6: Achievable rate comparisons of PMM, MIMO, GSM and SM evaluated using FCI method where X(,,) means system X with parameter ,  and  observe the effect of employing full antenna activation in PMM where we gain the benefit of full degree of freedom in the space dimension.
We evaluate the achievable rate using FCI of PMM and compare it with other systems in Fig. 6.The results are computed using (30) where the same method is followed by all systems.From (30), we can observe that each system is distinguished according to the system's symbol distances.It is worth noting that we use the same average transmit power in obtaining the results for all systems.Thus, the results are evaluated under fair circumstances.We can observe that at both 8 and 5 bits transmission, MIMO and SM have the best and worst performance, respectively.While PMM performs in between MIMO and SM for both transmission mode.It is important to note that PMM employs arbitrary permutation matrices with the same power allocations for all symbol combinations.The performance of PMM can still be improved by optimizing the symbol distances in (30).

B. Symbol Error Rate
We present the simulation of SER of PMM in Fig. 7 and the SER comparisons between different systems in Fig. 8.The results are evaluated by feeding 10 6 bits to the encoder at each SNR, and we perform the Monte Carlo procedure.
In Fig. 7, we set the antenna to be 4 × 4 using 4-PSK modulation.There are two different generic power allocations where the first one is defined by TABLE III and the second one is using  1 = 0.45,  2 = 0.30,  3 = 0.15 and  4 = 0.10.For simplicity, let us name the first power allocation (from TABLE III) as PA-1 and the second power allocation as PA-2.We can see in Fig. 7, ML has a better performance compared to ZF over the whole SNR range for both power allocations.It is obvious since ML performs exhaustive search so that there is no interference left from the other streams, and the only cause of error is the noise as shown in (40).The effect of noise can be further reduced by increasing  to obtain higher SNR, as Fig. 7: SER of PMM for 4 × 4 using 4-PSK modulation with different power allocations indicated in the figure.Unlike ML, which is a joint-detection based, ZF detects the permutation matrix and constellation symbols step-by-step.Thus, the detection of the constellation symbols depends on the correctness of the permutation matrix detection.Furthermore, the cause of error in ZF is not only the resulting noise.In fact, the resulting noise becomes colored due to the multiplication of the pseudoinverse channel matrix as shown in (44), which means the noise is increased due to this process.These are the reasons why ZF is worse than ML.An interesting result can be observed when we set different power allocations.PA-1 has worse performance than PA-2 for ML and ZF.It is straightforward to see that PA-1 has smaller gaps between the highest to the lowest power allocation compared to PA-2, where PA-1 has  1 −  4 = 0.18 as the difference between the highest and the lowest power allocation, while PA-2 has 0.35 difference.It means that PMM results in better SER when we set more sparse power allocation for both ML and ZF detection.
Fig. 8 represents the SER performance of different systems.All systems use ML detection and are set such that each system can transmit 16 bits per transmission with the same number of receive antennas.It is worth noting that GSM activates 3 antennas at each transmission.We can observe that the SER performance of each system varies across the SNR range.For example, PMM performs well at low SNR but declined at high SNR.An important issue to be noticed in this result is that PMM only uses 4PSK with 5 transmit antennas.While the other systems use at least 16QAM in order to maintain small antenna usage in transmitting 16 bits.Moreover, SM needs 256QAM and 256 transmit antennas.This is one of the most important benefits we can gain from PMM where with far smaller constellation size or number of antennas, we can Fig.8: SER comparisons between PMM, GSM, MIMO and SM using ML detection for 16 bits transmission transmit the same number of bits with the other systems.

VII. A LOOK ON THE COMPLEXITY OF ML AND ZF DETECTION SCHEMES
In this section, we present the analysis of the complexity of ML and ZF detection schemes.The complexity can be measured with the number of floating-point operations (flops) 11 .We first show how to compute the complexity, and then we plot figures which represent the complexity of the corresponding detection scheme.Note that the complexity of every operation (i.e., matrix-matrix product, etc.) is computed using the standard calculations; no simplification is considered.
We can compute the complexity of ML detection by realizing that ML is based on an exhaustive search as shown in (38).The receive vector y ∈ C  ×1 is subtracted with vector, let us name it, ŷ P,ŝ = H Pŝ where H ∈ C  × , P ∈ P of length  ×  and the -th element of vector ŝ ∈ C  ×1 , ŝ ∈ Q. Vector ŷ P,ŝ must cover all possible combinations based on the inputs P and ŝ ∀, and also the length of vector ŝ which is .We know that  and  are the cardinality of set P and Q, respectively.Thus, there are a total of   combinations to be created.To create each combination, we require  flops for the product of P and  (they are both diagonal matrices),  2 flops for the product of H and P (full matrix-diagonal matrix product) and (2 − 1) flops for the product of matrix H P and vector ŝ (full matrix-vector product).Therefore, we require a total of  +  2 + (2 − 1)  flops to create a single combination for ML detection.At last, subtraction of vector y and vector ŷ P,ŝ 11 flop is defined as one addition, subtraction, multiplication or division of two floating-point numbers.
flops.From (57), we know that the complexity of ML grows by  (  2   ) where  (.) is the big O notation.On the other hand, ZF consists of three steps: • Creation of matrix H ZF from (43) which requires  2  + ( − 1) flops from H H × H (matrix-matrix product), 4 3 flops from (H H H) −1 (matrix inversion using the standard Gaussian elimination) and  2  + ( − 1) flops from (H H H) −1 × H H (matrix-matrix product).Hence, the creation of matrix H ZF requires 4 3 +2( 2  + ( −1)) flops in total.In the special case, when  = , we can obtain H ZF by directly inverting the channel matrix H which will require 4 3 flops.• Detection of the permutation matrix shown by (45) which requires 4 − 1 flops from diagonal matrices product, diagonal matrix-vector product, and the norm operation.This process is repeated  times to cover all possible permutation matrices.Thus, this step requires a total of  (4 − 1) flops.• Detection of the constellation symbols which requires 2 flops from subtracting ỹ and ŝ and taking the norm.This process is repeated  times (for all the transmitted constellation symbols).Therefore, a total of 2 is required for this step.
It is important to note that the complexity difference between ML and ZF is insignificant for small antenna setting and modulation level.However, a substantial difference appears when we employ a high antenna setting or modulation level where the complexity of ML grows exponentially compared to ZF.This behavior can be observed in Fig. 9 where we plot the complexity ratio between ML and ZF.The ratio is obtained by dividing the complexity of ML and ZF with the same parameters, CML This leads to the marginal cdf as given by (67) By deriving the marginal cdf in (67) and using calculus manipulation, we can have the pdf of the transmit signal x as shown in (8).

where
≤ , which means we have unmerged components  (+1) y , . . ., ( )  y .Thus, we have C tight = 1   =1 P  P  and  tight =   .In this case, we can write  tight to be  tight =  tight log 1  tight + log I M + C tight   +  ∑︁ =+1   log 1   + log I M + P  P    .(36) Note that  = 0 when  tight is obtained from merging all the mixture components becomes a single component.We can see that all matrix terms inside the determinant in (36) are diagonal matrices which can be written as summation of logarithmic function similar to (34b).Since  tight is a concave function within the feasible solution, the optimization problem in (35) can be reformed as follows minimize   , =1,...,  −  tight subject to 0 ≤   ≤ ,  = 1, . . .,   ∑︁ =1   = .
Matti Latva-Aho (Senior Member, IEEE) received the M.Sc., Lic.Tech., and Dr.Tech.(Hons.)degrees in electrical engineering from the University of Oulu, Finland, in 1992, 1996, and 1998, respectively.From 1992 to 1993, he was a Research Engineer at Nokia Mobile Phones, Oulu, Finland, after that he joined the Centre for Wireless Communications (CWC), University of Oulu.He was the Director of CWC, from 1998 to 2006, and the Head of the Department for Communication Engineering, until August 2014.Currently, he serves as an Academy of Finland Professor and is the Director for National 6G Flagship Program.He is also a Global Fellow with Tokyo University.His research interests include mobile broadband communication systems and currently his group focuses on 6G systems research.He has published over 500 journals or conference papers in the field of wireless communications.In 2015, he received the Nokia Foundation Award for his achievements in mobile communications research.

TABLE II :
Bits to permutation matrix mapping for  = 3 |  =   /(  +   ) and     and     are the means of mixture components ()

TABLE III :
Power allocation of PMM