The Tensor Multi-linear Channel and its Shannon Capacity

Tensors are multi-way arrays which can be used to model systems spanning many domains. This work proposes to use tensors for characterizing, analyzing, and designing multi-domain communication systems. Most modern day communication systems make use of coding and modulation across different domains such as space, frequency, time. Hence a unified mathematical framework characterizing such a multiple-domain system in an intuitive manner is well needed. In this paper, we present such a unified framework that characterizes a communication system with N input domains, M output domains and an M + N domains multi-linear tensor channel. The proposed framework is generic where the physical interpretation of the domains is system specific. We illustrate a few examples from multi-antenna multicarrier and multi-user systems that fit the proposed framework. Assuming a fixed tensor channel, we provide an information theoretic analysis by deriving its Shannon capacity and power allocation under a variety of power constraints. In this paper we show how the tensor framework’s suitability to mathematically describe a family of power constraints can be used to design and analyze various multiple domain communication systems. The tensor based approach extends water filling from a matrix setting to tensors, encapsulating the effects of multiple domains thereby allowing joint multi-domain pre-coding.We show that the capacity prelog for a tensor channel increases exponentially in the number of domains, indicating the potential of tensor based multi-domain communication systems to provide the large information transmission rates envisaged for 5G and beyond systems. We also show the application of the tensor framework in characterizing the capacity and rate regions of multi-user MIMO channels. Both Multiple Access and Interference channel are considered where the tensor based approach leads to a coordinated users transmission scheme. Such a scheme ensures higher achievable sum rates as compared to independent user transmissions.


I. INTRODUCTION
A tensor is a multi-domain array that can be seen as an N th order generalization of a vector or a matrix, where a vector is a tensor of order one and a matrix is a tensor of order two [1]. Tensors were introduced in the early nineteenth century with applications in Physics where such mathematical structures can be seen as a mapping from a linear space to another whose coordinates transform multilinearly under a change of bases [2]. Later, tensors found applications in psychometrics in the sixties with the work of Tucker [3] as an extension of two-way data analysis to higher-order datasets, and in chemometrics in the eighties [4], [5]. In the last few decades, tensors as an extension of matrices have found extensive applications in various engi-neering disciplines including computer vision [6], [7], data mining [8], [9], machine learning [10], neuroscience [11], signal processing [12]- [14] and multi-linear system theory [15], [16]. Tensors provide a unified and intuitive framework to represent processes with dependencies on more than two indices. Through a tensor based approach, we can develop models which capture interactions between various parameters enhancing the understanding of their mutual effects. A detailed summary of tensor algebra results can be found in many publications such as [1], [9], [10], [17]- [20].
In the field of communication systems, so far tensors have been used primarily from a signal processing point of view. Most modern communication systems use multidomain modulation and coding schemes for effectively ex-ploiting resources in transmission and reception. A common example in wireless communications is that of using the space domain, i.e. multiple input multiple output (MIMO) systems based on multiple antennas, to achieve a significant increase in channel capacity [21]. Systems including MIMO in conjunction with multi-carrier techniques such as Orthogonal Frequency Division Multiplexing (OFDM) [22], [23], Generalized Frequency Division Multiplexing (GFDM) [24], [25], Filter Bank Multi-carrier (FBMC) [26], etc., have been widely researched over past few years. Techniques have been developed to improve link reliability through space-time, space-frequency, and space-time-frequency coding methods [27] that exploit diversity in all spatial, temporal and frequency domains. Hence the received and transmitted signals have an inherent multi-domain structure which can be represented by tensors. Such a multi-domain approach using tensors has been considered in [28], [29], where the focus has been around developing blind detection techniques at the receiver. A tensor based space-time-frequency coding structure is proposed in [30] for a MIMO OFDM-Code Division Multiple Access (CDMA) system where signals are represented using order-5 tensors. Applications of tensors are also being considered for modelling various 5G and 6G communication technologies such as Intelligent Reflecting Surfaces [31], massive MIMO [32], [33], millimeter wave [34], [35], MIMO relay systems [36]. Recently, a tensor based framework for a general multi-domain communication system using the tensor contracted product has been developed in [37]. In addition, [37] also considers tensor based joint-domain equalization at the receiver to combat interdomain interferences. Hence the tensor approach of handling a multi-domain communication system can lead to design of new and improved transmission and reception schemes. This paper considers information theoretic aspects of discrete version of the tensor-modelled multi-domain communication systems of [37].
The domains in a communication system can represent space, time, frequency, users, propagation delay, spreading sequence etc. This necessitates a generic unified mathematical framework with which we can model any multi-domain communication system, and hence tensors naturally come into play. Having such a framework would not only provide a mathematical basis for existing schemes, but would also act as a stepping stone for developing new and improved systems spanning multiple domains such as G.hn networks [38], 5G and beyond [39], as well as 6G [40]. The main contribution of this paper is the generic tensor based system model where an order M + N channel tensor is used to connect an order N input tensor with an order M output tensor using the Einstein product. We find the Shannon capacity of such a deterministic tensor channel with various input constraints, revealing its exponential increase with the number of domains. A tensor framework endowed with the Einstein product has been used in [15], [16] to extend basic linear system concepts to the multi-linear realm. Our work introduces the multi-linear tensor channel and extends basic Information Theoretic concepts to such models.
The initial motivation to use tensors for our purpose stems from their unique suitability to retain the distinction between multiple domains in the system model, thereby allowing a convenient representation of a variety of power constraints across domains. Even though tensor entities occur naturally in multi-domain communication systems, it should be noted that in principle a tensor can be represented using a matrix or a vector. For instance, the slices of a third order tensor can be stacked together to form a bigger matrix. Such matrix representations are sometimes used in order to leverage the well established linear algebra concepts for analysis. However, representing a naturally occurring higher order tensor using a lower order array such as a matrix or vector collapses the distinct multiple indices which are used to identify the domains. Thus in order to restore the identifiability of domains, it becomes imperative to use tensors [41]. In addition, the tensor based approach leads to joint domain pre-coding and power allocation operations which perform much better than matrix approaches using per domain processing. Also, the proposed framework in this paper can be used to find the capacity of multi-user MIMO channels. The tensor model allows to keep users and antennas as distinct domains in a multi-user system. Hence the problem of finding the channel capacity under per user power constraints for any number of users can be approached using the tensor framework.
Capacity of the MIMO matrix channel was analyzed in [21] by converting the MIMO channel into parallel noninterfering scalar channels through a singular value decomposition of the channel matrix. Since the late 1990's and early 2000's a significant research effort has been invested in extending the work in [21] under various assumptions. A detailed summary can be found in [42]. Capacity behaviour of OFDM based MIMO systems has been explored in [43] by concatenating the transmit vectors over different antennas and different sub-carriers into a single vector and using a block-diagonal channel matrix. There has not been an attempt so far to quantify the capacity for a channel spanning more than two domains in terms of the number of domains itself and also to find the multi-domain power allocation required to achieve capacity, which calls for a tensor based approach. One impediment en-route such application of tensors in information theory is the inability to completely diagonalize a tensor and thereby convert a higher order channel into scalar channels. Complete diagonalization of any tensor with tensor decomposition techniques such as Parallel Factorization (PARAFAC) or Tucker decomposition is not achievable in general [1]. However, recent work in [18], [44] proposes a tensor singular value decomposition (SVD) and eigenvalue decomposition (EVD) approach as special cases of Tucker decomposition for solving multi-linear systems of equations. A generalization of such a tensor SVD and EVD is presented in this paper which is used to compute the channel capacity.
The capacity of a MIMO GFDM channel modelled as a sixth order tensor, under sum power constraint is considered in [45]. Further, [46] presents the notion of tensor partial response signalling as a means to generate multi-domain signals with desired spectral and cross-spectral properties. The trade-off between domains of a communication system as revealed through the tensor approach is also studied in [46]. In [47], the capacity of tensor channels under a family of power constraints, which includes per antenna or sum power constraints as its specific cases, is considered. Going beyond [47], in this paper we consider the channel characterization as a tensor and its Shannon capacity under different power constraints in detail with new results and applications. In particular, we consider the application of the tensor framework for MIMO multiple access and interference channels. We use our proposed solution to characterize rate regions and quantify achievable sum rates in a multi-user system under user cooperation with per user power constraints. A comparison of these results with other methods in literature which assume independent user transmissions is also included. We also present examples of several multi-domain systems such as multi-user MIMO OFDM where the channel can be characterized as higher order tensor. For any such multi-domain system, we present an algorithmic approach of approximating the optimal input covariance under a variety of input power constraints along with a discussion on its computational complexity. For MIMO GFDM systems, we use our proposed solution to prescribe a precoding scheme under per-antenna power constraints and analyze its BER performance. We also present results on multiplexing gain achieved by the tensor channel. Through several numerical examples, we present the capacity behaviour of tensor channels for various channel sizes, with different domain power constraints, and channel with correlated entries. This paper is organized as follows : A brief review of tensor algebra is presented in section II. Section III introduces the discrete system model for a multi-domain communication system using tensors. The notion of a channel as a higher order tensor is proposed with an example. Section IV presents the required information theoretic notions for tensors. Section V considers the capacity of tensor channels under a family of power constraints, an algorithm for finding the input covariance along with its complexity, and multiplexing gain provided by tensor channels. Section VI presents numerical and simulation results showing an application of our work to MIMO GFDM systems. Section VII considers multi-user MIMO Multiple Access Channels (MAC) and Interference Channels (IC) using the tensor framework. The paper is concluded in Section VIII.

II. ELEMENTS OF TENSOR ALGEBRA
In this section we present essential tensor algebra results needed for this paper. A detailed treatment of tensor algebra can be found in [1], [18], [37] and references within.

A. NOTATIONS
Throughout this paper, deterministic vectors are represented using lower-case underline fonts, e.g. x , matrices using upper-case fonts, e.g. X and tensors using upper-case calli-graphic fonts, e.g. X. Their corresponding random quantities are denoted by bold fonts, e.g. x, X and X X X for random vectors, matrices and tensors respectively. The individual entries of a tensor are denoted by indices in subscript, e.g. the (i, j, k)th element of a third order tensor X is denoted by X i,j,k . A colon in subscript is used to indicate all elements of a mode, e.g. X :,j,k represents all the elements of first mode corresponding to the jth second and the kth third mode. The nth element in a sequence is denoted by a superscript in parentheses, e.g. A (n) denotes the nth tensor in a sequence of tensors. Expectation is denoted by E[·], entropy by H(·) and mutual information by I(·; ·). The set of complex numbers is denoted by C. An all-zero tensor is represented by 0 T .

B. BASIC TENSOR OPERATIONS AND DEFINITIONS
Tensors are multi-way arrays with components indexed by N indices also known as modes. The number of modes, N is called the order of the tensor. The set of all tensors of size I 1 × . . . × I K over C forms a linear space, denoted as T I1,...,I K (C). Einstein product is a tensor contracted product where contraction is over the last N consecutive modes of A and first N consecutive modes of B. The outer and inner products can be seen as special cases of Einstein product. For tensors X, Y ∈ C I1×...×I N and Z ∈ C J1×...×J M , we have : where X, Y is a scalar and (X•Z) ∈ C I1×...×I N ×J1×...×J M .
Definition 4. mode-n product [50]: The mode-n product of a tensor X ∈ C I1×I2×...×I N with a matrix U ∈ C J×In is denoted by X× n U and is defined as Definition 5. Square tensors [49]: A tensor A ∈ C I1×...×I N ×J1×...×J M is called a square tensor if N = M and I k = J k for k = 1, . . . , N .
For square tensors A, B of size I ×J ×I ×J, it was shown in [18] that f I,J|I,J (A * 2 B) = f I,J|I,J (A) · f I,J|I,J (B) where · refers to the usual matrix multiplication. This property can be easily generalized to square or rectangular nonsquare tensors of any order in the form of the following Lemma [51]: Definition 6. Pseudo-diagonal Tensors : Any tensor D ∈ C I1×...×I N ×J1×...×J M of order N + M is called pseudo-diagonal if its transformation to a matrix, D = f I1,...,I N |J1,...,J M (D) ∈ C I1···I N ×J1···J M results into a diagonal matrix such that D i,j is non-zero only when i = j. In case of a rectangular D, such a matrix is often called main diagonal or principal diagonal matrix [52].
A square tensor D ∈ C I1×...×I N ×I1×...×I N is pseudodiagonal if all its entries D i1,...,i N ,j1,...,j N are zero except when i 1 = j 1 , i 2 = j 2 , . . . , i N = j N . In [18], [53], such a tensor is termed as diagonal. However, we tend to call it pseudo-diagonal for our purpose of discussion, so as to make a clear distinction from the diagonal tensor definition more widely found in literature. A diagonal tensor is one whose entries D i1,...,i N are zero except when i 1 = i 2 = · · · = i N [1]. An illustration of the difference between diagonal and pseudo-diagonal tensor is presented in [37]. Note that pseudo-diagonality for a non-square tensor as in Definition 6 is defined with respect to partition after N modes. For instance, if we refer to a third order tensor as pseudo-diagonal, it is important to specify whether it is pseudo-diagonal with respect to partition after the first mode or the second mode. Hence to avoid overload of notation in this paper whenever we write a tensor explicitly as order N + M or order 2N and call it pseudo-diagonal, then it is with respect to a partition after N modes.
Using the Einstein product and Lemma 1, we can extend many linear algebra concepts to a multi-linear setting [18], [49]. For instance, a square pseudo-diagonal tensor of order 2N denoted as I N ∈ C I1×...×I N ×I1×...×I N is called an identity tensor if (I N ) i1,...,i N ,i1...,i N = 1 for all i 1 , . . . , i N . The tensor X −1 ∈ C I1×...×I N ×I1×...×I N is an inverse of a square tensor of same size, Tensor SVD: ..×J M are unitary tensors and D ∈ C I1×...×I N ×J1×...×J M is a pseudo-diagonal tensor whose non-zero values are the singular values of A. The existence of such a tensor SVD can be shown by using Lemma 1 and the SVD of matrix f I1,...,I N |J1,...,J M (A) [18]. Note that Tucker decomposition of a tensor can also be seen as a higher order SVD of tensors [50] and has found many applications particularly in extracting low rank structures in higher dimensional data [54]. The relation between Tucker decomposition and the tensor SVD presented here is explained in [18].
Tensor EVD: Let A ∈ C I1×...×I N ×I1×...×I N , X ∈ C I1×...×I N , λ ∈ C, where X and λ satisfy A * N X = λX, then we call X and λ as eigentensor and eigenvalue of A respectively. The tensor Eigenvalue Decomposition (EVD) of a Hermitian tensor A ∈ C I1×...×I N ×I1×...×I N is given as ..×I N is a unitary tensor and D ∈ C I1×...×I N ×I1×...×I N is a square pseudo-diagonal tensor with its non-zero values being the eigenvalues of A and U containing the eigentensors of A. As a generalization of matrix eigenvalues, several other definitions exist in literature for tensor eigenvalues [55], [56]. But most of these definitions apply to super-symmetric tensors which are defined as a class of tensors that are invariant under any permutation of their indices [56]. Such an approach has applications in Physics and Mechanics [56]. But there is no single generalization to the tensor case that preserves all the properties of matrix eigenvalues [57]. For our purposes, we have presented a particular generalization from [44] which can be used for any square tensor, irrespective of symmetry in its elements.
The eignevalues of a Hermitian tensor are real. Subsequently, a Hermitian tensor X ∈ C I1×...×I N ×I1×...×I N is defined as positive semi-definite, denoted by X 0, if all its eigenvalues are non-negative, and as positive definite, denoted by X 0, if all its eigenvalues are strictly positive. Determinant of X, denoted by det(X), is defined as the product of its eigenvalues. Trace of a tensor A, denoted as tr(A), is defined as the sum of its pseudo-diagonal entries. Also, the following properties can be established using Lemma 1: (b) Commutativity : Einstein product is not commutative in general. However for the specific case where the contraction is over all the N modes of one of the tensors, say for tensors A ∈ C I1×...×I P ×J1×...×J N and B ∈ C J1×...×J N , we have (c) For invertible tensors A and B ∈ C I1×...×I N ×I1×...×I N , we have (d) For A, B ∈ C I1×...×I N of same size and order N , To prove (13), we can use Lemma 1 and Sylvester's matrix determinant identity [58].
Tensor Gradient : Consider a real-valued scalar function g(X) of a complex matrix X. The corresponding complex gradient matrix is defined as ∇ X g ∂g/∂X * , where 59]. We similarly define the complex gradient of a scalar function f (X) of a tensor X ∈ C I1×...×I N as ∇ X f ∂f /∂X * where the gradient is a tensor of the same size as X whose individual components are the derivatives with respect to the components of X * , i.e.

A. THE TENSOR CHANNEL
We define the input and output in a multi-domain communication system as tensor symbols of order N and M respectively. Let X X X ∈ C I1×...×I N represent the input (transmitted) tensor symbol with I 1 , I 2 , . . . , I N as the dimensions of its N domains where each component X X X i1,...,i N is a discrete complex symbol. Similarly, we represent the output (received) tensor symbol by Y Y Y ∈ C J1×...×J M with J 1 , J 2 , . . . , J M as the dimensions of its M domains. We define a multi-linear channel between the transmit and the receive side as a tensor of order M + N represented by H ∈ C J1×...×J M ×I1×...×I N . With additive noise, the system model can be specified by using the Einstein product of the channel tensor with input tensor where we contract along all the modes of the input tensor : 17) with N N N representing the received noise tensor of same size as Y Y Y. It is straightforward to see that the narrowband discrete time MIMO matrix channel model is a special case of this tensor model where input and output are order-1 tensors VOLUME 4, 2016 (vectors), the channel is an order-2 tensor (matrix) and the Einstein product reduces to regular matrix multiplication, y = Hx + n.
In case of a matrix representation of a discrete MIMO channel which characterizes only the space domain, each component of channel matrix H i,j represents the complex gain (i.e. amounting for both phase change and amplitude gain) of different paths between transmit and receive antennas. In (17), each component of the tensor channel is a complex gain that couples the elements between the order N input and order M output tensors. The proposed system model is generic and the number of modes along with the physical interpretation of the individual modes is systemspecific. Such modes can represent space, time, frequency, propagation delay, users, spreading sequence, etc., depending on the system. For a MIMO communication system with N R receive and N T transmit antennas, the continuous time input-output relation in a linear time varying channel is given as [62] : where x(t) is the N T × 1 continuous time input vector, n(t) and y(t) are the N R × 1 noise and received vectors respectively. The N R × N T matrix H(t, τ ) has components H nr,nt (t, τ ) that represent the channel impulse response between transmit antenna n t and receive antenna n r at time instant t and delay τ . If the channel is assumed time-invariant with a maximum delay τ max , the discretization of (18) at a sampling frequency f s gives the input/output relation at an instant k as [62]: and y[k] are the N R × 1 noise and received vectors respectively. The N R × N T matrix H[d] has components H nr,nt [d] which represents the length D channel impulse response between transmit antenna n t and receive antenna n r at delay d, where D = f s τ max . Delay can be considered as another domain in the system model. So for a time-invariant channel, at any time instant k the relation in (19) can be expressed using tensor model as [63]:  (20). As D increases, the tensor framework allows capturing the time dispersion in the system model by increasing the dimension of the third domain in the channel and the input. Now assuming channel is time variant, the discretized input/output relation can be given as [23]: where each element H nr,nt [k, d] represents complex channel gain between n t th transmit and n r th receive antenna for delay d at time instant k. In [23], assuming a cyclic prefix addition to each input block, the above equation is expressed in matrix notation as y = H x + n over a time block of N symbol durations by appending vectors y[k], x[k − (d − 1)] and n[k] for different k into vectors y , x and n of size N · N R , N · N T and N · N R respectively, and the channel matrix H[k, d] into a larger matrix H of size N · N R × N · N T . However, appending the vectors implies making the two distinct domains indistinguishable in the system formulation. Hence a more obvious and intuitive way to represent such a system would be using tensors where the channel can be expressed as N R × N × N T × D tensor where D = N + D − 1. We do not assume any cyclic prefix addition here. Since output at index k i.e. y[k] will depend on inputs , so corresponding to output being indexed by N time indices 0, 1, . . . , (N − 1), the input will be indexed by N + D − 1 time indices −(D − 1), . . . , (N − 1). So in the system model (20), we add a domain of length N time slots to the channel tensor and the output tensor to account for time variation and increase the delay domain length to N + D − 1 in the channel tensor and the input tensor. Thereby, our system model becomes : where all the individual vectors y We can see how the tensor system model in (20) simply evolved in (22) to account for time variation of the channel as well.

B. TENSOR MODEL APPLIED TO MU-MIMO OFDM
The model presented in (16) can be used for a wide variety of systems. For instance, the system model used in [44] for image restoration is a specific case of the system model of (16). To stress the relevance of the proposed tensor model, particularly in multi-domain communication systems, we now present an example of a multi-user MIMO OFDM system which can be represented using the tensor framework.
OFDM is one of the most popular multi-carrier schemes and has been used extensively with MIMO in 4G standards and Wi-Fi [64]. A conventional model for a MIMO OFDM system in the frequency domain is given by [23] : where 1 ≤ n r ≤ N R , 1 ≤ n t ≤ N T and 0 ≤ p, q ≤ N sc −1.
Using tensors, we can represent the frequency domain input/output relation in MIMO OFDM of (23) as : where each vectory[p] andň[p] from (23) for p = 0, . . . , N sc − 1 form the columns of matricesY andŇ of size N R × N sc , and vectorsx[p] form the columns of matrix X ∈ C N T ×Nsc . The input and output are connected by an order 4 tensor channelȞ ∈ C N R ×Nsc×N T ×Nsc where each elementȞ nr,nt [p, q] from (24) is now an element in the channel tensor asȞ nr,p,nt,q . We can expand the MIMO OFDM system model to include users as an additional domain in the model which will lead to a sixth order tensor channel.
In the case of multi-user MIMO OFDM, the frequency domain channel matrix is often represented as an N R × N T matrix corresponding to a specific user and a specific subcarrier [65], [66]. To account for inter-carrier interference (ICI) as well, the channel matrix could be represented aš H[u, p, q] ∈ C N R ×N T corresponding to the uth user for transmit sub-carrier q and receive sub-carrier p. Consider a multi-user MIMO OFDM downlink system where a base station is catering to U users having N R receive antennas each. The system model is a generalization of (23) and it is given by for p, q = 0, . . . , N sc − 1 and u = 1, . . . , U . The entitieš y[u, p] ∈ C N R ×1 andň[u, p] ∈ C N R ×1 represent the received signal vector and noise vector on sub-carrier p for the uth user, respectively. Also,x[q] ∈ C N T ×1 denotes the transmit vector from the base station at sub-carrier q, which is given by [67]:x where M[u , q] ∈ C N T ×N T denotes the pre-coding matrix used to transmit data vectorď[u , q] ∈ C N T ×1 to user u on sub-carrier q. Hence the system model of (26) becomeš  [u, p]. (29) In this case, the output and noise tensors can be rearranged into order three tensorsY Y Y,Ň N N ∈ C U ×N R ×Nsc . The componentsy nr [u, p] andň nr [u, p] are mapped to elements of third order tensors, denoted byY Y Y u,nr,p andŇ N N u,nr,p respectively. Similarly, the input can be rearranged as an order three tensoř D D D ∈ C U ×N T ×Nsc whereď nt [u , q] is mapped toĎ D D u ,nt,q . Subsequently, the channel can be represented as an order 6 tensorǦ ∈ C U ×N R ×Nsc×U ×N T ×Nsc where each element of matrix G nr,nt [u, u , p, q] from (28) is mapped to an elemenť G u,nr,p,u ,nt,q of the sixth order tensor channel. The system model then becomes :Y The tensor model represented by (30) can be considered as an evolution of the common matrix MIMO model in the space domain only, to a tensor MIMO model that in addition to space it encapsulates also the frequency and user domains.
Similarly, other systems such as MIMO DSL [68], MIMO CDMA [69], MIMO FBMC [26] and MIMO GFDM [24] can be represented using the tensor based system model. Representing (29) using (30), or (23) using (25) gives us a unified framework which encapsulates the multitude of signalling domains into well structured tensor entities. The matrix representations from (29) and (23) often leads to a per user or per sub carrier processing of the signals, where the inter-domain interferences are treated in combination with the additive noise term. On the contrary, the tensor model from (30) and (25) can be employed to design transceivers jointly across domains such as users, sub carriers, and antennas, using tools from tensor algebra. We present several numerical examples in sections VI-B and VII illustrating the advantages of the joint domain processing for multidomain communication systems. Note that an alternate rep-VOLUME 4, 2016 resentation of (29) can also be achieved via a large matrix channel based model, where the input and output signals are concatenated across users, sub carrier and antenna domains to form vectors. However, such representation will obscure the indices u , n t , q and u, n r , p into single indices with no well defined physical meaning. Such a lower order representation of high order entities makes it difficult to incorporate domain specific constraints such as those explained in section V-A which are well handled using tensors. As a prelude to the capacity of tensor channels, we first present some basic information theoretic notions for tensors in the next section.

IV. INFORMATION THEORETIC NOTIONS FOR TENSORS
A tensor is said to be random if its components are random variables. Expectation of a random tensor X X X ∈ C I1×...×I N denoted byM = E[X X X] ∈ C I1×...×I N , is a tensor with each component consisting of the expected value of the corresponding component of X X X.

Covariance and Pseudo-covariance
Covariance of a random tensor X X X ∈ C I1×...×I N can be represented using a tensor Q ∈ C I1×...×I N ×I1×...×I N defined as It can be shown that a covariance tensor is Hermitian, i.e. Q = Q H . The pseudo-covariance of X X X is also an order 2N tensor defined as E[(X X X−M)•(X X X−M)]. A random tensor is called proper if its pseudo-covariance is an all zero tensor [14].

Tensor Gaussian Distribution
The probability density function (pdf) of a random tensor X X X ∈ C I1×...×I N is a scalar function of all its elements. Hence the pdf of a tensor having proper complex Gaussian entries can be specified by vectorizing the tensor and using the pdf of a proper complex Gaussian vector. However, using the properties of the Einstein product, the pdf can also be expressed without vectorizing as explained in [14]. The pdf of a proper complex Gaussian distributed tensor X X X ∈ C I1×...×I N is given as : whereM = E[X X X] is the order-N mean tensor and Q = E[(X X X −M) • (X X X −M) * ] is the order-2N covariance tensor.

A. DIFFERENTIAL ENTROPY OF A CIRCULAR COMPLEX GAUSSIAN DISTRIBUTED TENSOR
A complex random tensor X X X is defined as circular if it is rotationally invariant, i.e. X X X and Y Y Y = e jα X X X have the same probability distribution for any given real α. A complex Gaussian random vector is circular if and only if it is zero mean and proper [70]. This statement can be extended to tensor case also : Lemma 2. A complex Gaussian random tensor is circular if and only if it is zero mean and proper.
The proof of this lemma directly follows from the definition of proper and circular tensors. The distribution of a circular symmetric complex Gaussian tensor X X X ∈ C I1×...×I N with covariance Q is given by (31), withM = 0. Subsequently, the differential entropy of such a tensor is given by : The following lemma regarding the differential entropy of a circular Gaussian random tensor is proven in [45] : Lemma 3. Let X X X ∈ C I1×...×I N be a circularly symmetric complex Gaussian random tensor with covariance tensor Q. Let Y Y Y ∈ C I1×...×I N be another zero-mean random tensor with same covariance tensor. Then, H(X X X) ≥ H(Y Y Y), i.e. for a given covariance tensor, a circular symmetric Gaussian distribution is the entropy maximizer.

B. MUTUAL INFORMATION
For the system model defined in (16), the output covariance tensor is cross terms (34) Assuming X X X and N N N are zero mean and independent, it can be shown that the cross terms are zero. Based on commutativity rule of (9), we get H * * N X X X * = X X X * * N (H * ) T = X X X * * N H H . Using the associativity rule (8), we get : where Q X X X and Q N N N are the input and noise covariance tensors respectively. The following two lemmas can be proven using the definition of circular symmetric complex Gaussian tensors : Lemma 4. If X X X ∈ C I1×I2×...×I N is a circular symmetric complex Gaussian tensor, then so is Y Y Y = H * N X X X for any D.Pandey, H.Leib : The Tensor Multi-linear Channel and its Shannon Capacity deterministic tensor, H ∈ C K1×K2×...×K P ×I1×I2×...×I N .
Lemma 5. If X X X and Y Y Y are independent circular symmetric complex Gaussian tensors of same order and size, then Z Z Z = X X X + Y Y Y is also circular symmetric complex Gaussian.
Assuming noise to be a zero-mean Gaussian distributed tensor with covariance Q N N N that is independent of the input tensor X X X, we can write the mutual information between input and output tensors as : Based on Lemma 3 and received covariance derived in (35), we can write : where equality is achieved only if Y Y Y is Gaussian.

V. CAPACITY OF A FIXED TENSOR CHANNEL
Finding the Shannon capacity of the tensor channel requires the maximization of the mutual information between the input and the output tensors over input distributions under possible constraints. In this work, we assume that the channel tensor is known and the noise tensor is zero-mean Gaussian distributed having independent components with variance σ 2 , and hence the noise covariance tensor is given by Q N N N = σ 2 I M . For simplicity we assume σ 2 = 1. Now let us consider the mutual information inequality of (38) where equality is achieved only if Y Y Y is Gaussian. In our tensor channel model (16), since N N N is zero-mean circular symmetric complex Gaussian, then using Lemma 4 and 5 implies that Y Y Y will also be zero-mean circular symmetric complex Gaussian if X X X is so. Hence for the purpose of maximizing mutual information, we take X X X as zero mean circular symmetric complex Gaussian with covariance Q X X X = Q. Thus with noise covariance tensor as identity tensor I M , we get and the capacity is given by, where the inequality constraint f (Q) ≤ 0 can represent a family of power constraints.

A. FAMILY OF POWER CONSTRAINTS
In a practical system, the transmit power constraints can span multiple domains. For instance, in a transmission scheme employing the space, time and frequency domains, instead of imposing power across the tensor symbol, the power constraint might be on each antenna, or each antenna and time slot, or each antenna, time slot and frequency bin. In our framework, we have the flexibility of mathematically representing any such power constraint. First we introduce some notations for a simplification. We denote the sequence of indices (i 1 , i 2 , . . . , i N ) as i. Let i c denotes the sequence of indices indicating tensor symbol elements under power constraint and let i r represents the rest of the indices in i which are not in i c . For example, in an order 5 tensor of size Let the domains which are under individual power constraints be 2 and 3, then i c = (i 2 , i 3 ) and i r = (i 1 , i 4 , i 5 ). With this choice of notations, we will write I2 i2=1 I3 i3=1 as simply i c . Notations corresponding to cases when either i c or i r is empty are explained in Table  1.
Using these simplified notations, we will now describe a family of optimization problems to find capacity which can cover different types of power constraint, as follows : To understand how the above framework represents a large variety of constraints, let us consider a few specific cases. The case i c = i, hence i r is empty, will represent the situation where we have per element power constraints with P i c = P i1,...,i N and (42) becomes : . . .
Now let's assume we have power constraints on two domains K and L such that K < L ≤ N . Then i c = (i K , i L ) and (42)   . . .
Similarly, we can represent constraints on any number of domains. Lastly, let's assume that i c is empty, hence i r = i and the power constraint translates to the sum power constraint, i.e. P i c = P and we get : All these power constraints are linear, and the objective function in (41) is concave. Notice that the feasible set for this optimization problem is the set of positive semi-definite tensors satisfying the given power constraints which are linear. Hence the feasible set is convex. Thereby (41), (42) and (43) represent a family of convex optimization problems which can be solved using the Karush-Kuhn-Tucker (KKT) conditions [71]. Furthermore, since P i c are finite and nonnegative, a simple choice of covariance tensor belonging to the feasible set could be a pseudo-diagonal tensor with all non-negative entries such that they satisfy the power constraints. So the feasible set is a non-empty convex set, and hence by Slater's condition [71], strong duality holds and the optimal solution always exist. Next we will find the optimal solution using the KKT conditions.

B. SOLUTION USING KKT CONDITIONS
Let M 0 be the Lagrange multiplier tensor for the positive semi-definite constraint from (43) of size I 1 × . . . I N × I 1 × . . . I N . Let µ i c ≥ 0 for all i c be the Lagrange multipliers corresponding to all the linear constraints from (42). Then the Lagrangian functional can be defined as : We arrange the values {µ i c } in a pseudo-diagonal tensor B of same size as the input covariance such that its non-zero Based on (49), we can re-write the Lagrangian from (48) as : Based on (14), (15) and the definition of tensor gradient as presented in section II-B, the gradient of Lagrangian with respect to Q can be written as : (51) to 0 T , we get the first KKT condition as The KKT equations also include complementary slackness condition corresponding to each constraint and its associated Lagrange multiplier [71]. For the linear constraints in (42), the definition of complementary slackness leads to : For the constraint in (43), based on the approach taken for semi-definite programming [71], the complementary slackness can be written as tr(M * N Q) = 0. Since M, Q 0, similar to matrix case as presented in [72] , the complementary slackness for the positive semi-definite constraint is written as : The tensor KKT conditions for the problem in (41)-(43) are given by (52), (53) and (54).
Notice that all the entries of B, i.e µ i c , will be strictly greater than 0 at optimum because the inequality constraint must be met with equality at optimum. So B is a positive definite tensor, i.e. B 0 and hence invertible. Also since µ i c > 0, (53) can be written as as : Let us define a tensorH ∈ C I1×...×I N ×I1×...×I N and its tensor EVD as : We can use the result of Theorem 1 from [73] by extending it to tensor case to solve (52) and (54) subject to Q 0, M 0 and B 0 to obtain the optimal Q . The tensor version of Theorem 1 from [73] along with its proof has been included in Appendix A. Based on results from Appendix A, the optimal Q is given by whereD and V are obtained through the tensor EVD ofH (56) and (Z) + denotes a pseudo-diagonal tensor whose all the pseudo-diagonal entries are non-negative, i.e. (Z i1,...,i N ) + = max{0, Z i1,...,i N }. Hence we can calculate the capacity as : Since the determinant is the product of eigenvalues, we get : whered i1,...,i N are the non-zero eigenvalues ofH. Note that the optimum covariance tensor from (57) depends not only on the eigenvalues, but also the eigentensors ofH. Hence ensuring an optimum input covariance leads to not only an optimum power allocation scheme, but also a joint multidomain pre-coding at the transmitter which is required for achieving capacity. We will now simplify the expression for covariance under high SNR assumption and develop an algorithm to approximate the optimum covariance and capacity using (55), (57) and (59) In case the inverse does not exist, then a minimum norm least square solution can be adopted which aims to find an approximate tensor T such that ||(H H * M H) * N T − I N || 2 is minimized [18]. Assuming high SNR, we can ignore () + and write (57) as: and (58) as : Thus under high SNR approximation, we can find the elements of B by substituting Q from (60) into (55) to get : Let N i r denote the number of values that i r can take. For Since B contains µ i c on its pseudo-diagonal with each µ i c appearing exactly N i r number of times, from (63) we can write : which gives us all the entries of B. The proposed solution in (65), (61) and (60) assumes high SNR as we have ignored the () + operation. To extend the solution for any SNR, we now present a scaling approach to approximate the input covariance tensor at any SNR setting, and verify that the resulting covariance satisfies the constraints. If the covariance Q obtained from (60) has negative eigenvalues, then we force the negative eigenvalues of Q to be zero. If the tensor EVD of Q is given as U * N D * N U H , the pseudo-diagonal elements of Q can be written as, Thus the brute force approach of setting the negative values in D to zero can result into larger values at the pseudo-diagonal of Q. This in turn can make the solution infeasible, i.e. the power constraint P i c from (42) might not be met and we may get a different power allotted say P i c where P i c ≥ P i c . Note that So we scale the resulting Q using another pseudo-diagonal tensor S such that the power constraints remain satisfied, i.e. Q scaled = S * N Q * N S H where the pseudo-diagonal entries of scaling tensor S are given as : such that the pseudo-diagonal entries of Q scaled become : Hence, based on (66) we have Thus the choice of scaling operation ensures that Q scaled satisfies the power constraints. A similar technique has been used for matrix-field water filling in [74] where the diagonal entries of the covariance matrix are scaled to ensure that the resulting matrix is positive semi-definite. Such a scaling approach simplifies the computation of the input covariance but also makes it sub-optimal and hence leads to an approximation of the capacity. However, this approximation gets better as SNR grows and is exact at sufficiently high SNR. The procedure is systematically presented in Algorithm 1. Finding the capacity is a convex optimization problem, VOLUME 4, 2016 hence can be solved using software tools such as CVX [75], which can be compared with the capacity obtained via the approximation in Algorithm 1 to assess the validity of the proposed approach. We present such a comparison in the numerical examples ( Figure 9) to illustrate the accuracy of the proposed approach.
Algorithm 1 Finding the Input Covariance tensor Calculate P i c using (66) 15: Find pseudo-diagonal tensor S using (67) 16: Algorithm 1 essentially approximates the optimal input covariance tensor using (60) and a scaling process which ensures a feasible solution. The algorithm requires fixed amount of computational resources and can be deployed off line. Thus it is of interest to analyze the computational resources required to execute this algorithm. Modern day computational strategies often employ cloud services provided by suppliers such as AWS and Azure. Such cloud services often provide both the infrastructure and the platform/software as services on demand [76], [77], where the cost depends on the amount of required computational resources and time of execution. Given that Algorithm 1 requires extensive tensor operations which scale with the tensor size, using cloud services to implement it is an appealing option. Such cloud services provide several parallel and distributed computing infrastructures for faster and efficient computations [78]. Thus depending on the platform being considered and the amount of multi-core processors being employed for such implementation, the time of execution of the algorithm can significantly differ. However, independent of the computing infrastructure available, a suitable measure of the computational complexity can be specified in terms of the required number of mathematical operations to be performed in a given algorithm, as used in [79], [80]. Hence in this section, we analyze the computational complexity of Algorithm 1 in terms of the required number of flops for a given step. A flop is defined as a single floating point operation such as an addition, multiplication, subtraction, division, comparison (>, <, ==), or a scalar square root etc.
The complexity of Algorithm 1 primarily depends on finding the tensor inversion and the tensor EVD for a tensor of size I 1 × . . . × I N × I 1 × . . . × I N . Such tensor operations can be performed using various algorithms by employing the Einstein product as discussed in [18], [44], [49], [81]. Consider the definition of Einstein product across N modes between tensor A of size (3). Based on (3), computing a single element in A * N B requires K 1 · K 2 · · · K N multiplications and also K 1 · K 2 · · · K N − 1 additions. There are total I 1 · I 2 · · · I P · J 1 · J 2 · · · J M elements in the tensor A * N B. For ease of notation, we use where each element is computed using K multiplications and K − 1 additions. Subsequently finding all the elements of A * N B requires IJK multiplications and (K − 1)IJ additions. Hence a total of IJK + (K − 1)IJ = (2K − 1)IJ flops are required for computing all the elements in A * N B. Note that for any operation where the number of required flops is polynomial in its size n, i.e. the operation requires a 0 n p + a 1 n p−1 + · · · + a p flops (for fixed constants a 0 , a 1 , . . . , a p ), we represent the complexity only using the highest power in n as O(n p ). Hence the complexity of the Einstein product which requires (2K − 1)IJ flops is given as O(IKJ) which is same as O((I 1 · · · I P ) · (K 1 · · · K N ) · (J 1 · · · J M )). Thus for the specific case where both A and B are order 2N tensors of size I 1 × . . . × I N × I 1 × . . . × I N each, the complexity of A * N B is given as O((I 1 · · · I N ) 3 ).
Now assume channel H is of size J 1 × . . . × J M × I 1 × . . . × I N and N i c denotes the number of values that i c can take. In Algorithm 1, the first two steps are input initialization.
Step 3 requires computing the Einstein product over the common M modes of H H and H, which based on the discussion in the previous paragraph has a complexity of O((I 1 · · · I N ) 2 (J 1 · · · J M )). Further within step 3, it is required to find the inverse of H H * M H which is an order 2N tensor. The inverse of an order 2N tensor can be calculated using the higher order bi-conjugate gradient (HOBG) method described in [18] or Newton's method (NM) which solves an iterative equation involving Einstein product between tensors of size I 1 ×. . .×I N ×I 1 ×. . .×I N . Thus each iteration in NM has a computational cost of O((I 1 · · · I N ) 3 ) [81]. It is important to note that the complexity of such iterative methods depends on the number of iterations, which in turn depends on the desired accuracy level set to achieve convergence. It was shown in [81] that NM requires a lower number of iterations as compared to HOBG to reach the same accuracy. The NM requires O(log(I 1 · · · I N )) iterations to converge to a fixed error bound [14]. Furthermore, [14] shows a method to reduce the per-iteration complexity of NM, and thus perform tensor inversion in O(log 2 (I 1 · · · I N )) parallel time units. Also, non-iterative methods such as Gauss elimination based on triangular decomposition of tensors [49] can be used for tensor inversion which requires a computational complexity of O((I 1 · · · I N ) 3 ). Hence, the worst case complexity of tensor inversion without any use of parallel processors is O((I 1 · · · I N ) 3 ). Eventually step 3 calculates each µ i c using (65) which requires N i r + 1 additions and 1 division, and this needs to be done for all the N i c values that i c can take. Thus this step requires (N i r + 2) · N i c flops and its complexity is O(N i r · N i c ). Note that since N i r · N i c = I 1 · · · I N , thus complexity of finding µ i c can be written as O(I 1 · · · I N ).
Step 4 which finds Q using (60) subtracts an order 2N tensor from a pseudo-diagonal tensor. Since the number of pseudodiagonal elements are I 1 · · · I N , step 4 performs I 1 · · · I N subtractions and thus has a complexity of O(I 1 · · · I N ). Further step 5 finds the tensor EVD of an order 2N tensor Q. The complexity of finding the EVD of a tensor of size [16]. Algorithms which generalize the matrix eigen decomposition approaches to tensor EVD using the Einstein product properties can be found in [44], [49], [16,Algorithm C.2] . Steps 6 to 11 essentially perform the operation max(0, D i1,...,i N ) on each of the I 1 · · · I N pseudodiagonal elements of the tensor D. Hence it has a complexity of O(I 1 · · · I N ).
Step 12 is just a single scalar comparison, and step 13 updates Q using the Einstein product between tensors of order 2N with size Step 14 finds P i c for all the N i c values of i c . Thus it performs N i r additions for each of the N i c values that i c can take. Hence step 14 requires N i r · N i r flops and thus has a complexity of O(N i r · N i r ) which is same as O(I 1 · · · I N ).
Step 15 calculates the scaling factor for all i c , thus performs N i c divisions and square roots. Hence it has a complexity of O(N i c ). Finally, step 16 updates Q using the Einstein product between tensors of order 2N and thus has a complexity of O((I 1 · · · I N ) 3 ). Step Operation Complexity 3 Find Table 2 summarizes these step by step computational complexity cost of Algorithm 1. The first column in Table 2 indicates the step number from Algorithm 1, second column describes the operation and third column states the complexity. We can observe that all the entries in complexity column of Table 2, have a complexity order of 3 (cubic) or less in I 1 · · · I N except the first operation which has a complexity of O((I 1 · · · I N ) 2 · J 1 · · · J M ). Hence on summing all the entries of the third column in Table 2, we see that the overall complexity of Algorithm 1 is given as O((I 1 · · · I N ) 2 · J 1 · · · J M ) + O((I 1 · · · I N ) 3 ). Further in the case when J 1 · · · J M ≤ I 1 · · · I N , the complexity of Algorithm 1 can be written as O((I 1 · · · I N ) 3 ).
Note that the steps in Algorithm 1 with complexity of O((I 1 · · · I N ) 3 ) primarily rely on the Einstein product operation. However, since this algorithm can be executed on multicore computer systems, the time complexity of performing all the operations in Einstein product can be significantly reduced by making use of parallel processing. Several flops in the Einstein product can be executed simultaneously on a parallel computing platform. To see this, assume tensors X and Y of size Then all the elements of Z can be computed using multiple processors as shown in Figure 1.
The white rectangular nodes in Figure 1 correspond to the individual multiplication operations. All the white nodes need to be added to compute a single element (the gray rectangular nodes) in tensor Z. The addition of white nodes to generate the gray nodes can be done using a binary tree approach as shown in Figure 1 where all the addition operations at a given level of the tree are performed simultaneously. The figure illustrates this process for a single gray node, but further all the gray nodes can be computed simultaneously using similar binary tree approach if multiple processors are available.
For a specific gray node, the number of nodes in a binary tree at each level starting from root (gray node) are 2 0 , 2 1 , 2 2 , . . . , 2 h where h is the depth of the tree and 2 h is the number of leaf nodes. We can have similar binary trees for each of the gray nodes. Since in Figure 1, the number of leaf nodes (white nodes) are I 1 · · · I N , we get that the depth of the tree corresponding to each gray node is log(I 1 · · · I N ) . We use the ceil operator as I 1 · · · I N may not always be a power of 2. Hence we can say that the height of the tree is O(log(I 1 · · · I N )). Since all the gray nodes can be computed simultaneously, all the individual elements of X * N Y can be calculated in O(log(I 1 · · · I N )) parallel time units. Such a parallel processing method can significantly reduce the time complexity of calculating the Einstein product, and subsequently other tensor operations which rely on the Einstein product such as tensor inversion and EVD. The other steps in Algorithm 1 which have a complexity of O(I 1 · · · I N ) or less (step 15), can also be performed faster using parallel processors. For example in step 3, all the µ i c of (65) for different i c can be calculated simultaneously on parallel processors. Similarly all the I 1 · · · I N operations in steps 4 and 6-11, can be performed simultaneously. In steps 14 and 15, the P i c from (66) and the scaling factors from (67) for all the i c can also be computed simultaneously. Hence Algorithm 1 can be suitably adapted to run on parallel processing multi-core computer systems depending on the number of processors available. Several VOLUME 4, 2016 platforms such as MATLAB provide support for parallel implementation of such algorithms. A more detailed study into the parallelization of the proposed algorithm for faster time complexity has been left for future investigation. In general, developing fast and efficient algorithms for several tensor functions depending on the Einstein product and its properties is an active area of research in the numerical tensor algebra community [82]- [87].

D. COMPARING DIFFERENT CONSTRAINTS
Let the set of all possible positive semi-definite tensors of size I 1 × . . . × I N × I 1 × . . . × I N be represented by Q. Let the feasible set for the optimization problem in (41)-(43) for two different settings i c1 and i c2 be Q 1 and Q 2 respectively. Assume i c1 is a subsequence of i c2 . For instance, let i c2 = (i 1 , i 2 , i 3 ), and i c1 = (i 1 , i 2 ). So Q 1 represents a set of all positive semi-definite tensors Q which satisfies and Q 2 represents a set of all positive semi-definite tensors Q which satisfies In (70), the first two domains are under power constraints, whereas in (71), the power constraints span the third domain as well with i3 P i1,i2,i3 = P i1,i2 . Summing over i 3 in (71) gives (70). Hence every Q that satisfies (71) will also satisfy (70), showing that the set of Q satisfying (71) is a subset of the set of Q satisfying (70), i.e. Q 2 ⊆ Q 1 .
Let the optimal value of the objective function in the optimization problem in (41)-(43) for set Q 1 be C 1 and for Q 2 be C 2 . From the basic principles of convex optimization [71], it is known that a globally optimal point is also locally optimal. Hence if C 1 is the maximum of the objective function over the set of constraints Q 1 , then C 1 is also the maximum of the objective function over Q 2 since Q 2 ⊆ Q 1 . Hence C 2 ≤ C 1 , where equality is possible if the optimal Q lies in the set Q 2 . This holds for any configuration of i c1 and i c2 so far as i c1 is a subsequence of i c2 . Essentially, as more domains are put under constraints, the feasible set for the optimization problem shrinks, possibly lowering the achievable capacity.
For instance, consider a 2 × 2 input where the two domains are antenna and time slots. Let the capacity achieved under total power constraint P , be C 1 . The capacity achieved under per antenna power constraints of P 1 for antenna 1 and P 2 for antenna 2 such that P 1 + P 2 = P be C 2 , then since the set of feasible solution with per antenna power constraints is a subset of the set of feasible solution with total power constraint we have C 2 ≤ C 1 . Similarly, capacity achieved under power constraints per element, i.e. P 11 , P 12 , P 21 , P 22 where P i,j represents power budget on ith antenna and jth time slot, such that P 11 + P 12 = P 1 , P 21 + P 22 = P 2 , be C 3 , then C 3 ≤ C 2 . This has also been shown in [88] for a MIMO channel where capacity under per antenna power constraint is always smaller than the capacity under sum power constraint.

E. CAPACITY UNDER SUM POWER CONSTRAINTS
Under the sum power constraint of (47), i c is empty and hence there is a single Lagrange multiplier µ associated with the constraint (47). Hence the tensor B, which contains the Lagrange multipliers on its pseudo-diagonal, will be a scaled identity tensor. Substituting B = µI N in (56) gives Substitutingd i1,...,i N = d i1,...,i N /µ in (59) gives : where a + denotes max{0, a}, d i1,...,i N are the non-zero eigenvalues of H H * M H and 1/µ is chosen to satisfy (55) The optimum covariance derived in (72) (74) to find µ and subsequently use (73) to find the capacity.

F. MULTIPLEXING GAIN
We can also characterize the capacity contribution by each constrained domain separately and the multiplexing gain under various power constraints. For a fixed tensor channel, the tensor B −1 is pseudo-diagonal with entries µ −1 instance, assume that out of N input domains, elements of the first domain are under individual power constraints such that i c = (i 1 ) and i r = (i 2 , . . . , i N ), then can be seen as the contri- can be seen as the contribution of the i c th element of the constrained domains to the overall capacity. Substituting µ i c from (65) into C i c , we can further write : We can write the multiplexing gain provided by i c th constrained domain as : Since for large P i c , we have log , hence using (81) and (80) we get = lim In general since N i r represents the product of dimensions of the domains not under individual constraints, it increases exponentially in the number of unconstrained domains. As a result, for tensor channels the multiplexing gain increases exponentially with the increase in the number of unconstrained domains. Note that in deriving the multiplexing gain, for simplicity we have assumed that all the eigenvalues of H H * M H are non-zero, which may not always be the case. In general, the capacity is a function of the given channel's specific singular values, some of which may be zero. In that case depending on which and how many singular values are zero, the multiplexing gain will be different and may not increase exponentially with increase in domains.
Based on (84), the multiplexing gain achieved under sum power constraint is I 1 · · · I N . However (84) assumes that inverse of (H H * M H) exists which will be the case if all its eigenvalues are non-zero. If that is not the case, then an approximation of the inverse (H H * M H) can be calculated using the Higher order Bi-conjugate Gradient method described in [18]. Next we analyze the multiplexing gain, also known as pre-log, associated with a tensor channel under sum power constraints. Note that the number of non-zero eigenvalues of (H H * M H) will be same as the number of non-zero singular values of H. Let I 1 · I 2 · · · I N = I and VOLUME 4, 2016 J 1 · J 2 · · · J M = J, then the number of non-zero singular values R are less than or equal to I and J, i.e. R ≤ min{I, J} where equality is met if all the the singular values of H are non-zero.
The pre-log, denoted by χ, is calculated at a very high SNR, for which water-filling reduces to uniform power allocation over the non-zero eigen channels, i.e.
. Hence using (73), the capacity becomes As P → ∞, (85) simplifies to : Assuming all channel singular values are non-zero, we have R = min{I 1 · I 2 · · · I N , J 1 · J 2 · · · J M }. For a conventional MIMO matrix channel, capacity pre-log is known to be less than or equal to the minimum of the number of transmit and receive antennas [21] which is the specific case of the tensor pre-log. For MIMO matrix channel N = M = 1, and we have χ = min{I 1 , J 1 } where I 1 and J 1 are the number of transmit and receive antennas respectively. For a tensor case, it is interesting to see that assuming equal dimension sizes on each domain i.e. I 1 = I 2 = · · · = I N = J 1 = J 2 · · · J M = L, then the capacity pre-log can be given as : which is exponential in the number of domains.

VI. NUMERICAL EXAMPLES AND APPLICATIONS
In this section, we present numerical examples to illustrate previous results. We also present an example of a MIMO GFDM system modelled using the tensor framework.

A. EXAMPLES OF ERGODIC CAPACITY WHEN CHANNEL REALIZATIONS ARE KNOWN
Our results assume that the channel is deterministic. For the numerical examples, rather than using a specific channel tensor, we generate the channel using the Rayleigh model as in [21], [89], [90]. The channel tensor consists of realizations of i.i.d. circular complex Gaussian entries of zero mean and unit variance such that E[|H H H j1,...,j M ,i1...,i N | 2 ] = 1 [89]. These channel realizations are known at the transmitter and receiver.
Let the capacity of a deterministic tensor channel H be denoted as C(H). Assume now that we have capacities of K such channels denoted by C(H (k) ) for k = 1, . . . , K where the tensor H (k) consist of realizations of complex Gaussian random variables. The average capacity of K such deterministic channels is given byC where H H H is a tensor of Gaussian random variables. All the numerical results presented in this section presentC K for K = 100, which can be interpreted as the ergodic capacity of a random tensor channel when its realizations are known at the transmitter and the receiver. The SNR is defined as P/σ 2 as used in [21], where P is the sum power constraint or the total transmit power and the noise tensor contains i.i.d. circular complex Gaussian entries with zero mean and variance σ 2 = 1.

1) Comparison with Uniform Power Allocation
If the channel is not known then the ergodic capacity of a random tensor channel, E[C(H H H)] will be a function of the pdf of H H H. For a MIMO matrix channel, capacity when the channel is Rayleigh distributed with channel state information available at only receiver is achieved by uniform power allocation at the transmitter [21]. In this paper we present a comparison between uniform power allocation and the tensor water-filling power allocation for a tensor channel. Figure 2 presents capacity in bits per tensor symbol under sum power constraint for 2 different sizes of tensor channel with uniform power allocation across all the transmit tensor elements and optimal tensor water-filling approach. Similar to the MIMO matrix case, it can be seen that uniform power allocation is sub optimal but at high SNR, it gives similar capacity as achieved by optimal power allocation. Also, it can be seen that the total capacity is higher when we increase the number of input and output domains. Next, we present examples to illustrate the impact of different domains and their dimensions on tensor channel capacity.

2) Capacity for different sizes of Channel Tensor
In Figure 3 we present the channel capacity in bits/tensor symbol for a fourth order channel tensor under a sum power constraint at SNR of 10 dB. Input and output are order-2 tensors of size X ×Y each, corresponding to a X ×Y ×X ×Y tensor channel. It is seen that increasing X and Y individually causes an increase in total capacity. To understand the effect of different types of channel, we also find capacity when the channel is normalized such that the total receive average power is identical to the transmitted power. Such a normalization has been suggested in [91], [92] for the MIMO channel case. In tensor channels, such a normalization is achieved when the individual entries of the channel tensor H ∈ C J1×...×J M ×I1×...×I N are generated as circular complex Gaussian with zero mean and variance 1/(J 1 · · · J M ).  Figure 4 shows the capacity of such a normalized tensor channel of size X × Y × X × Y with both input and output of size X ×Y each, under sum power constraints corresponding to an SNR of 10 dB. On increasing the size of input and output tensors, the rate of increase in capacity is lower as compared to Figure 3, and the capacity tends to reach a saturation for large values of X and Y . For the case where channel gain is unity, and hence transmit and receive signal power are same, capacity is plotted against received SNR for a 2 × 2 × 2 × 2 tensor channel in Figure 5. A comparison with corresponding scalar and matrix channels with the same received signal power is also presented. The gain in capacity achieved by moving from scalar to a tensor channel can be attributed to the multiplexing gain provided by tensor channels which increases with the number of domains. For the rest of the numerical examples, the more widely used model from [21], [89] where the channel entries are circular complex Gaussian with zero mean and unit variance is employed, as for Figure 3. In Figure 6, we present the capacity for a fourth order tensor channel under sum power constraints where the output is fixed as a 2 × 2 tensor and input is X × Y tensor at 10 dB SNR, under sum power constraints. The rate of increase in capacity for X and Y is slower as compared to Figure 3 for higher values of X and Y as the output tensor size is fixed as 2 × 2. So the capacity pre-log which is bounded by min{X · Y, 2 · 2} = 4, does not increase with increasing X and Y . We see that increasing the size of individual domains of the input tensor does not provide significant gain if the number and size of

3) Capacity under different domain power constraints
In this section we compare different possible power constraints. In order to approximate the optimum input covariance and thereby capacity, under per domain element and per element power constraints, we use Algorithm 1. Figure 7 illustrates the capacity under sum power constraint and per domain element power constraint for a 2 × 2 × 2 × 2 channel with power constraint on input tensor of size 2 × 2. The power budgets on one of the domains of the input tensor are P 1 = x · P and P 2 = (1 − x) · P . To put it in context, let the two domains be space and time. Then this constraint reflects that the power on first time slot for both the antennas is P 1 and on second slot for both the antennas is P 2 with total available power P 1 + P 2 = P . The plot in Figure 7 is presented for capacity against x = P 1 /P at 10dB SNR. The flat line represents the capacity under sum power constraint which shows no variation with x, and the curved line shows the capacity with per domain element power constraints. As can be observed from Figure 7, the capacity under per domain element constraint is always lower than the capacity under sum power constraint, and these become very close to each other when x ≈ 0.5, i.e. uniform power is allotted to the elements of the constrained domain. Note that such a behaviour is observed over an average of 100 channel realizations. For a given specific realization, the two curves may not meet at x = 0.5. For the MIMO case, a similar numerical result has been presented in [88] under per antenna power constraints. In Figure 8 we present the capacity under per element power constraints and compare it with sum power constraint. If total available power is P , then as before P 1 = x · P and P 2 = (1 − x) · P . Further, P 11 = y · P 1 , P 12 = (1 − y) · P 1 , P 21 = y · P 2 and P 22 = (1 − y) · P 2 . Thus, with 0 ≤ x, y ≤ 1, P 11 , P 12 , P 21 , P 22 denote the individual power constraints on all the four elements of input tensor such that P 11 +P 12 +P 21 +P 22 = P . With different choices of x and y, we achieve different per element power constraints such that total power remains P . The capacity with per element power constraints against x and y at SNR of 15 dB is presented in Figure 8. The flat surface represents capacity under sum power constraint which shows no variation with x and y, and the curved surface shows capacity under per element power constraints. It can be seen that for different values of x and y, the capacity achieved under per element constraints can be significantly lower than the capacity achieved under sum power constraint.
Note that Algorithm 1 only approximates the optimum input covariance and thus does not provide the exact capacity at low SNRs. However, since the problem at hand is a convex optimization problem, several standard software tools for numerical optimization can be used to find the capacity. To analyze how well the scaling approximation in Algorithm 1 works, we present a comparison between the capacity calculated through the convex optimization software tool CVX [75], and the capacity approximated through Algorithm 1. Figure 9 presents the capacity for sum power constraint, per domain element power constraints where a single domain  of dimension 2 is constrained with power budgets P 1 , P 2 , and per element power constraints where all the four elements have different power budgets P 11 , P 12 , P 21 , P 22 against SNR for x = y = 0.1. We present such results for capacity calculated by using two approaches. First approach uses Algorithm 1 and the graphs are presented using solid curves. Second approach uses CVX and the graphs are presented using dashed curves. As can be seen, the capacity calculated via the approximation of Algorithm 1 matches very closely to the one calculated via CVX, and is almost indistinguishable at moderate to high SNR. This shows that Algorithm 1 provides a reasonably good approximation to the optimal solution at low SNR, while providing an exact solution at high SNR. Furthermore, it can be seen that capacity under per domain element and per element constraints is always upper bounded by capacity under sum power constraint. The capacity increases with SNR for all three cases, but the performance difference also gradually increases between the three solid curves, with sum power constraint performing the best, followed by per domain element and lastly, the per element constraint.

4) Correlated Tensor Channel
Consider an order N random tensor H H H ∈ C I1×...×I N with i.i.d. zero mean and unit variance entries. Let Ψ (n) ∈ C In×In for n = 1, . . . , N be a sequence of Hermitian matrices such that Ψ (n) = A (n) A (n)H where A (n) ∈ C In×In is the square root matrix of Ψ (n) . The mode-n product of tensor H H H across all the modes with these matrices (also known as Tucker product) is expressed as [93]: Then the correlation matrix of the vectorized channel is given as :

E[vec(H H H corr ) vec(H H H corr
= (A (N ) ⊗ · · · ⊗ A (1) )(A (N ) ⊗ · · · ⊗ A (1) ) H where (95) to (96) follow from matrix Kronecker product properties [95,Corollary 4]. Hence the correlation matrix of the vectorized tensor is given in terms of the Kronecker product of different mode-n factor correlation matrices denoted by Ψ (n) . Such a model is called separable and it is considered for real random variables in [96]. While (97) Hence the separable model implies that each element in the correlation tensor can be written in terms of product of the elements of the factor matrices. The proof in [96] is for real tensors, but can be easily extended to complex tensors as well.
The well known MIMO matrix Kronecker correlation model forms a special case of (91) where the tensor H H H is of order 2 and the factor matrices A (1) and A (2) represent the square root of row and column correlation matrices respectively [97]. The MIMO Kronecker model may not be very accurate in all scenarios, but has been widely used because of its tractable analytic form, see for example [98]- [101]. Now consider an order 4 tensor channel of size 3×3×3×3 corresponding to an order 2 input and order 2 output. We generate such a channel H H H corr with correlated elements using (91), where the entries of H H H are i.i.d zero mean complex Gaussian with unit variance. For numerical illustration, we consider the correlation matrices generated using the exponential model with different correlation factor ρ n where the elements of the correlation matrix Ψ (n) are defined as Ψ (n) i,j = ρ |i−j| n for ρ n ∈ [0, 1) [102]. The four correlation matrices Ψ (1) , Ψ (2) , Ψ (3) and Ψ (4) are generated using the exponential model with correlation coefficients ρ 1 , ρ 2 , ρ 3 and ρ 4 respectively. Assuming that the channel realization is known at both transmitter and receiver, we find the capacity of such a channel with correlated elements under sum power constraint. Figure 11 presents capacity at 10 dB SNR for different values of correlation coefficients where the receive domains are correlated with ρ 1 = ρ 2 = ρ R and the transmit domains are correlated with ρ 3 = ρ 4 = ρ T . The plot shows that capacity decreases with increase in ρ T and ρ R , and it is least when ρ T and ρ R approach 1. Capacity is largest when both ρ T and ρ R approach zero in which case the correlation matrices are identity and the channel has only uncorrelated entries across all the domains.
Next we investigate the impact of correlation on the tensor channel capacity when the correlation spans over a variable number of domains. Figure 12   uncorrelated. Further it can be observed that the capacity difference between the various cases presented in Figure  12 is more significant at higher SNR. It is seen that the capacity decreases with increase in the number of domains having non-zero correlation factor. Such a loss of capacity with increase in the domains having correlation is further illustrated in Figure 13 for various tensor channel order. Figure 13 presents the capacity against SNR for order 2M correlated tensor channels with order M input and output having individual dimensions of 3. The factor correlation matrices along each of the 2M modes are based on exponential model with correlation factor ρ. The plot is presented for M = 2, 3, 4, 5 and ρ = 0.4, 0.7. Note that different values of M lead to different tensor channel sizes with order 4, 6, 8, and 10. Hence for a meaningful comparison of the impact of correlation, the capacity plotted in Figure 13 is normalized with respect to the number of elements in the transmit tensor symbol which is 3 M . As can be seen in Figure  13, for a fixed ρ, as the number of domains of the tensor channel with all correlated elements increases, the capacity per element decreases. Further, this loss is more significant for higher values of ρ (increased correlation) as can be seen by comparing the plots for ρ = 0.7 and 0.4.

B. TENSOR BASED MIMO GFDM SYSTEM
Generalized Frequency Division Multiplexing (GFDM) is a multi-carrier modulation scheme where each GFDM symbol consists of complex valued data symbols d k,m distributed across K sub-carriers and M timeslots known as sub-symbols. Thus each GFDM block consists of N = KM complex data symbols. Consider a data vector d ∈ C N ×1 which contains the complex data symbols d k,m . The transmitted signal x ∈ C N ×1 is produced as x = Ad, where A ∈ C KM ×KM is the GFDM modulation matrix. The modulator matrix is defined as A = (g 0,0 , . . . , g K−1,0 , g 0,1 , . . . , g 0,M −1 , . . . , denoting the shifted version of the prototype filter impulse response g[n] to the mth sub-symbol modulated on the kth sub-carrier. A detailed description of the modulator structure can be found in [103]. After cyclic prefix addition to ensure no inter-block interference, the transmission through a wireless channel is modelled as y = Hx + n where H ∈ C N ×N is the circular channel convolution matrix and y, n ∈ C N ×1 represent the received signal and noise vectors respectively [103]. The extension to a MIMO GFDM system is considered in [24], [25] where the data vectors for different antennas are concatenated together. Hence the data corresponding to all the sub-carriers, sub-symbols and antennas for both transmitter and receiver are arranged in transmit and receive vectors, and the channel is arranged as a matrix to represent the system model. However, a more natural way of representing the signals and the channel in GFDM is using tensors since such a model can retain the distinction between the different domains namely sub-carrier, sub-symbols and antennas at each stage. Such a model has been considered in [37], [45]. Assume a MIMO GFDM system with N R receive and N T transmit antennas. Let a vector of N = KM complex data symbols be considered as a single GFDM data stream. Consider a case where S such independent streams of complex data symbols denoted by d (s) ∈ C N ×1 for s = 1, . . . , S are each transmitted using K sub-carriers and M sub-symbols. Assume that S = N T = N R and each of the S independent streams of data is transmitted concurrently using a different antenna among the N T transmit antennas. For instance, if we have S = N T = 2, then data stream 1 is mapped to antenna 1 and data stream 2 is mapped to antenna 2. Let the transmit data symbol corresponding to kth sub-carrier, mth sub-symbol and sth transmit stream represented as d s,k,m , be an element of a third order tensor D D D ∈ C S×K×M . Similarly the received signal and noise tensors can be represented using third order tensorsD D D ∈ C S×K×M and N N N ∈ C S×K×M respectively. Subsequently the channel that couples the input D D D with the outputD D D can be seen as an order 6 tensor H ∈ C S×K×M ×S×K×M . The tensor channel considered here is the equivalent channel obtained from the cascading of the transmit filter, physical channel and the receive filter. More details on the representation of the signal and the channel model can be found in [37], [45]. The overall system model is represented as and it is illustrated in Figure 14 which represents the system model for a MIMO GFDM system with 2 transmit and 2 receive antennas. In Figure 14 a matrix is shown as a square and a higher order tensor as a double-line square with its order written on top right corner. A third order tensor is represented as a cube with staggered edges. The data corresponding to each antenna for all the K sub-carriers and M sub-symbols is represented as a K × M matrix, where the matrices corresponding to each antenna form slices of the third order input tensor. The covariance of the input tensor D D D can be written as a sixth order tensor Q of size S×K ×M ×S×K ×M . We simulate such a system with two transmit and two receive antennas, considering three different cases : case 1 : the entire transmit tensor is under sum power constraint P , i.e. tr(Q) ≤ P . We obtain the optimal covariance Q (opt1) in this case which performs power allocation and precoding at the transmitter using : where V H and D H are obtained from the tensor EVD of (H H * 3 H), and I is an identity tensor of size S × K × M × S × K × M . The coefficient µ is calculated using tensor water-filling to ensure tr(Q (opt1) ) = P . case 2 : the two transmit antennas have different power budgets, P 1 = x·P and P 2 = (1−x)·P . Hence the constraint on the covariance can be written as k,m Q 1,k,m,1,k,m ≤ P 1 and k,m Q 2,k,m,2,k,m ≤ P 2 . We obtain the input covariance Q (opt2) in this case using Algorithm 1 with the channel tensor H, power budgets P 1 and P 2 , and N i r = K · M as inputs. case 3 : the constraint is same as case 2, but now we perform per antenna power allocation and pre-coding. The optimum covariance in this case is determined by independently calculating the covariance for the data transmitted on the two antennas. We find Q . Notice that the constraints k,m Q 1,k,m,1,k,m ≤ P 1 and k,m Q 2,k,m,2,k,m ≤ P 2 can be equivalently written as tr(Q (1) ) ≤ P 1 and tr(Q (2) ) ≤ P 2 respectively. We find the optimal Q (i) for i = 1, 2 using tensor water-filling as D (i) are obtained from the tensor EVD of (H (i) H * 2 H (i) ).
The coefficient µ i is calculated using the tensor water-filling to ensure tr(Q (i) ) ≤ P i .
For the simulation, the input tensor D D D contains entries drawn from a 4QAM constellation. At the transmitter, a raised cosine transmit pulse shaping filter with roll off factor 1 is employed as used in [25], [37]. The receive filter is matched to the transmit filter. Number of sub-symbols used is M = 5 and number of sub-carriers used is K = 8. The channel between transmit and receive filter is generated as complex Gaussian with i.i.d. zero mean and unit variance entries. The overall channel is normalized to ensure that the received power is the same as the transmit power P . Hence the received SNR per bit is calculated as P/b where b is the number of bits in each received tensor symbol, obtained as b = (Number of elements in each tensor) × (bits per element). The noise added is AWGN with zero mean and unit variance such that the noise covariance is the identity tensor.
The capacity of the channel H from (98) under three different cases of input covariance with x = 0.1 for case 2 and 3 is shown in Figure 15. As can be seen the capacity achieved for case 1, i.e. sum power constraint is always higher than the other two cases. For case 2, where individual antennas have different power budgets and joint domain processing is used to generate the optimal covariance outperforms case 3 where separate covariances are obtained for antenna 1 and 2 independently.
We also test the BER performance when the input is generated from a 4QAM constellation for the three different cases and the results are shown in Figure 16 where BER is plotted against received SNR per bit in dB. At the transmitter, the covariance of the input D D D ∈ C 2×8×5 before precoding is given by an identity tensor. The input D D D is contracted with the square root of the optimum covariance as Q (opt) 1/2 * 3 D D D. This ensures that the covariance of the input tensor after precoding, defined as E[(Q (opt) 1/2 * 3 D D D) • (Q (opt) 1/2 * 3 D D D) * ] is given by Q (opt) ∈ C 2×8×5×2×8×5 . Hence, based on (98), we getD D D = H * 3 Q (opt) 1/2 * 3 D D D + N N N. To assess the BER performance, we assume a multi-linear minimum mean is minimized, and is given by [14]: where W = H * 3 Q (opt) 1/2 ∈ C 2×8×5×2×8×5 and I is an identity tensor of size 2 × 8 × 5 × 2 × 8 × 5. The estimate is passed through a QAM demodulator to recover the transmitted symbols. Figure 16 presents results based on Monte-Carlo simulations with averaging over 100 different channel realizations. For evaluating bit error rate (BER), at least 100 errors were collected at the receiver for each channel realization. It can be seen that the best performance is achieved in case 1, where there is no individual power constraint on domains. A comparison between cases 2 and 3 shows that with per antenna power constraints, joint processing across the domains (as in case 2) performs better than the per antenna processing (as in case 3).

VII. MULTI-USER MIMO CAPACITY
In this section, we consider the application of the tensor framework to multi-user MIMO channels. We present numerical examples comparing the tensor approach with other results found in literature. We consider K-user Gaussian Multiple Access Channels and Interference Channels. Consider a multi-user MIMO network where a base station (BS) equipped with N R antennas is receiving information from K users with N T antennas each. If we denote the uplink channel matrix between the kth user and the base station by H (k) ∈ C N R ×N T , then the discrete time received signal y ∈ C N R ×1 at the BS can be written as [42] : where x (k) ∈ C N T ×1 is the signal transmitted by the kth user and n ∈ C N R ×1 is the received noise vector which is assumed circularly symmetric complex Gaussian with identity covariance matrix. In such a multiple access channel (MAC), each user k is subject to an individual power constraint P k . If the transmit covariance matrix of user k is denoted as Q (k) , then the constraint is represented as tr(Q (k) ) ≤ P k for k = 1, . . . , K.
A multi-user channel with K users is characterized by a K-dimensional achievable rate region C R , known as capacity region [42], where each point in the region (R 1 , R 2 , . . . , R K ) represents the achievable rates R k at which user k can send information with arbitrarily low error probability. We assume that all the channel matrices are known to the receiver and all the transmitters. To denote the convex hull of the union of sets we use the symbol¯ . With power constraints (P 1 , P 2 , . . . , P K ), the capacity region of MIMO MAC is given by [42] : where S denotes a subset of the set of users. Each set of covariance matrices (Q (1) , . . . , Q (K) ) satisfying the power constraints corresponds to a K dimensional polyhedron [104]. The capacity region is the convex hull of the union of all such polyhedrons. Note that for Gaussian MIMO MAC, the capacity region is defined using only the union of rate regions and the convex hull is not needed [42]. It is shown in [105] that for Gaussian MIMO MAC, the boundary points of the capacity region can be characterized by maximizing a weighted sum rate k ν k R k for all non-negative ν k such that k ν k = 1, and thus finding boundary points can be cast into a convex optimization problem. Essentially, (101) represents a set of bounds on individual rates R 1 , R 2 , . . . , R K , and combination of rates such as R 1 + R 2 , R 1 + R 3 , R 2 + R 3 , R 1 + R 2 + R 3 and so on, including the sum rate R 1 + R 2 + · · · + R K . For a two users case, this can be represented by (102) (at the top of next page). For a given choice of covariance matrices Q (1) and Q (2) which satisfy the power constraints, (102) represents an upper bound on R 1 , R 2 and R 1 + R 2 . The maximum of log det(I + H is the sum capacity achievable when both users are transmitting. Note that the choice of covariance matrices Q (1) and Q (2) satisfying the power constraints which achieves the sum capacity may not achieve the individual capacities. Similarly the choice of Q (1) and Q (2) which achieves the individual capacities may not achieve the sum capacity. In general, different transmit choices lead to different pairs of covariance matrices (Q (1) , Q (2) ) such that the capacity region is given by the convex hull of the union of an infinite number of rate regions each corresponding to a different set (Q (1) , Q (2) ). The optimal choice of Q (1) and Q (2) which achieves the sum capacity is found by an iterative waterfilling approach [105] which sequentially finds covariance for each user with the single-user classical water-filling method assuming interference from other users as noise. A detailed step by step algorithm can be found in [105].
Such an approach, however, assumes that different users transmit independently. The iterative water-filling algorithm treats the information available about other users' interfering channels as noise. In the presence of complete channel state information, a better transmit strategy which would provide higher achievable rates would be to allow users to coordinate for transmission. Hence a joint signal transmit strategy can expand the capacity region. Using the tensor framework, we can find the capacity region assuming user coordination. First, let us represent (100) using the tensor system model. We define the multi-user MIMO tensor channel as a third order tensor H ∈ C N R ×N T ×K where H :,:,k = H (k) . The input signal can be defined as a matrix X ∈ C N T ×K , where each x (k) of (100) forms a column of the matrix X. Hence the system model in (100) can be represented as : The input covariance is represented as an order 4 tensor Q ∈ C N T ×K×N T ×K . Assuming the noise to be circular symmetric complex Gaussian with identity covariance matrix, denoted by I, the output covariance can be written as (H * 2 Q * 2 H H + I). Subsequently, the sum capacity of such a system with user coordination can be calculated from the following optimization problem : Q 0.
where N T n=1 Q n,k,n,k ≤ P k represents the individual power constraints for different users. The optimal Q that achieves capacity can be approximated using Algorithm 1. Note the difference in this tensor formulation and the iterative waterfilling approach used in vector formulation is that the latter assumes different users transmit independently despite having perfect channel state information. With independent transmissions, the iterative water-filling maximizes the function log det(I+ K k=1 H (k) Q (k) H (k)H ) subject to tr(Q (k) ) ≤ P k and Q (k) 0 for k = 1, . . . , K [105]. Note that this objective function is same as the upper bound on the sum rate from (101) for S = {1, . . . , K}. The vector based iterative water-filling treats inter-user interference as noise since it attempts to optimize the sum rate over all choices of separate covariance matrix for each user as shown in [105]. On the other hand, the tensor approach solves the problem in (104)-(106) and aims to find a joint covariance across all the users. Thereby, the tensor approach suggests a joint transmit scheme for all the users wherein the interuser interference is not treated as noise since the interference term also carries signal information. The maximum sum rate achieved by all the K users given in (104) users which are included in S.
Hence the tensor framework allows to define a capacity region with user coordination as : where S contains the list of users being considered. The vector p (S) contains the power budgets of the users included in S, with its components p (S) and find what is the maximum rate that user 1 can transmit given the power constraint on user 1 and that all other users are silent. In this case, H (1) ∈ C N R ×N T ×1 is essentially the matrix H (1) between the user 1 and base station, and Q (1) ∈ C N T ×1×N T ×1 is the covariance matrix Q (1) of user 1. Hence this reduces to single user MIMO channel where the optimal Q (1) which achieves maximum rate can be found using classical water-filling. Similarly, conditions two and three in (108) correspond to the bounds on rates achieved by user 2 and user 3 (R 2 and R 3 ) respectively when all other users are silent. Hence, the first three conditions are same as the one derived from the vector case (101). Condition four corresponds to S = {1, 2}, thus gives a bound on the sum rate that user 1 and user 2 can together achieve given that user 3 is silent. In this case H (1,2) ∈ C N R ×N T ×2 is a sub-tensor of H as H (1,2) = H :,:,1:2 . The covariance tensor Q (1,2) ∈ C N T ×2×N T ×2 is a sub-tensor of Q given by Q (1,2) = Q :,1:2,:,1:2 . The power constraints are defined as n Q (1,2) n,1,n,1 ≤ P 1 and n Q (1,2) n,2,n,2 ≤ P 2 where P 1 , P 2 are power budgets for user 1 and 2 respectively. Note that the bound on sum rate R 1 + R 2 achieved using this method assumes that user 1 and 2 perform a joint transmission. Hence the sum rate achieved using the tensor approach will be different than the one obtained from iterative water filling which assumes independent transmission. Similarly, condition five represents the bound on sum rate R 2 + R 3 that can be achieved when user 2 and 3 transmit together keeping user 1 silent. In this case H (2,3) ∈ C N R ×N T ×2 is a sub-tensor of H given by H (2,3) = H :,:,2:3 . The covariance tensor Q (2,3) ∈ C N T ×2×N T ×2 is a sub-tensor of Q given by Q (2,3) = Q :,2:3,:,2:3 . The power constraints are defined as n Q (2,3) n,1,n,1 ≤ P 2 and n Q (2,3) n,2,n,2 ≤ P 3 where P 2 , P 3 are power budgets for user 2 and 3 respectively. Similarly condition six represents the bound on sum rate R 1 + R 3 that can be achieved when user 1 and 3 transmit together keeping user 2 silent. The last condition represents the bound on sum rate when all the three users are transmitting. Note that the covariance tensors satisfying the power constraint of all the equations in (108) can be seen as sub-tensors of the covariance tensor Q. For instance, Q :,1,:,1 = Q (1) represents the covariance matrix of user 1. But the optimal choice of covariance tensor Q that achieves the sum capacity under user cooperation can not be obtained from only individual covariance tensors Q (i) for different users. For instance, the optimal Q (1) that maximizes log det(I + H (1) Q (1) H (1)H ) may not be the sub-tensor of the optimal Q that maximizes log det(I + H * 2 Q * 2 H H ). The capacity region under user coordination is thus given by the convex hull of the union of all the rate regions over all the choices of covariance tensors which satisfy the power constraints.
Next we present a few numerical examples to illustrate the concepts. We consider a multi-user MIMO MAC scenario where K users with N T = 2 antennas each are communicating with a base station having N R = 10 antennas. The noise vector at the receiver is circular symmetric complex Gaussian with zero mean and identity covariance matrix. The channel entries are realizations of circular symmetric complex Gaussian random variables with zero mean and unit variance and these realizations are known at the transmitters and receiver. The results presented are averaged over 100 different channel realizations. Each user has an individual power constraint P k . The total transmit power is P = k P k . We assume all the users have the same power constraint, i.e. P k = P/K and plot the sum capacity obtained through the tensor approach achieved by K users against total power in Figure 17. The sum capacity can be found by solving the optimization problem in (104)- (106). We apply the proposed solution from Algorithm 1 to approximate the optimal input covariance tensor which achieves the sum capacity. The covariance tensor obtained from Algorithm 1 is then used to approximate the sum capacity given by log det(H * 2 Q * 2 H H + I). This approach assumes that all the users coordinate for transmission. It can be seen that for a fixed number of users, the sum capacity increases with an increase in the total transmit power. Furthermore, for a fixed total transmit power, the sum capacity increases when the number of users increases. Especially at higher transmit The results of Figure 17 can be contrasted with the iterative water-filling approach from literature where different users despite having channel state information of other users transmit independently. Figure 18 presents the sum capacity obtained with coordinated users and independent users against the number of users for two different values of total power. The sum capacity under independent users is calculated using the iterative water filling approach from [105, Algorithm 1]. It can be observed that as the number of users increases, there is a significant difference in achievable rate of coordinated users as compared to independent users. This shows that user cooperation captured in the input covariance tensor structure can improve the sum capacity substantially.
Further the capacity region of a 2 users MIMO MAC scenario is presented in Figure 19 with transmit power budgets P 1 = P 2 = 5 dB. We consider K = 2 where the base station The capacity region obtained from tensor formulation for two users can be written as : For the two users case, finding the optimum covariance that maximizes the achievable rate of user 1 assuming user 2 is silent reduces to a single user MIMO scenario. Hence the bounds on individual rates R 1 and R 2 in (114) and (102) are the same. Note however that the bound on the sum rate R 1 + R 2 differs. With the additional constraint on covariance from (109), the bound on the sum rate in (114) translates to : where (115) to (116) follow from (113). Hence, with additional constraint on covariance Q as defined by (109), the sum rate bound in (114) reduces to the sum rate bound in (102), and thus all three bounds in (114) depend only on Q (1) , Q (2) . The additional constraint corresponds to the transmit scheme where each user acts independent of the other user. In such cases, the capacity region is characterized by a pair of covariance matrices (Q (1) , Q (2) ) which satisfies the power constraints. In general, the capacity region of the two users case corresponding to (114) is characterized by a triplet (Q (1) , Q (2) , Q) where Q is the transmit covariance tensor which prescribes a joint transmission scheme. The transmit covariance matrix of individual users Q (1) , Q (2) form the sub-tensors of Q. However the optimal Q (1) , Q (2) which maximizes R 1 , R 2 respectively may not be the subtensors of optimal Q which maximizes R 1 + R 2 . The optimal Q (1) , Q (2) are found assuming the other user to be silent for transmission, whereas the optimal Q is found assuming joint transmission by both the users. The rate regions in (114) and (102) forms a pentagon on a two dimensional R 1 , R 2 plane. The capacity region is determined by the convex hull of the union of all such pentagons obtained through different choices of covariances which satisfy the constraints. In Figure 19, the solid line (case 1) represents the pentagon corresponding to (102) where (Q (1) , Q (2) ) are obtained via iterative water-filling from [105, Algorithm 1] to maximize the sum rate with independent transmissions. The dotted line (case 2) represents the pentagon corresponding to (102) where (Q (1) , Q (2) ) are obtained via conventional water filling for single user MIMO to maximize R 1 and R 2 individually. Similarly, different (Q (1) , Q (2) ) will correspond to different pentagons based on (102). The capacity region with independent users is obtained as a convex hull of the union of all such pentagons associated with different covariances which satisfy the constraints in (102). Thus the convex hull of the regions of case 1 and 2 gives an inner bound to the capacity region with independent user transmission. The capacity region signifies the bounds on the rate of transmission by each user.
The dashed line (case 3) represents the pentagon corresponding to (114) where Q is approximated using Algorithm 1 to maximize the sum rate, and Q (1) , Q (2) are the sub-tensors of the Q obtained from Algorithm 1. However such a choice of Q (1) and Q (2) may not maximize the individual rates of user 1 and 2. This is reflected in Figure 19 as the dotted line shows a larger bound on individual rates as compared to other cases in both the vertical and horizontal segments, A and B. Case 2 can be seen as another scenario for (114) where Q (1) , Q (2) are chosen to maximize the individual rates and the joint covariance tensor Q has structure Q :,i,:,j = Q (i) for i = j and 0 for i = j with i, j = 1, 2. Such a covariance tensor does not maximize the sum rate but only individual rates and still satisfies all the constraints in (114). The capacity region with user coordination is given by the convex hull of the union of pentagons associated with case 3 and case 2 along with every pentagon associated with different covariances which satisfy the constraints in (114). Thus, a convex hull of regions of case 2 and 3 gives an inner bound to the capacity region with user cooperation. The bound on the sum achievable rate indicated by segment C in Figure  19 is lowest in case 2, followed by the case when transmit covariances are chosen via iterative water-filling (case 1), and is largest when covariance is obtained using the tensor approach assuming user coordination (case 3). Moreover, this difference increases as N T increases from 4 to 8, as observed in the figure. Hence it is seen that the boundary of the capacity region expands in segment C with user cooperation as opposed to the independent user transmissions.

B. MIMO INTERFERENCE CHANNEL (IC)
Let us now consider the general multi-user MIMO interference channel (IC) where both transmit and receive side have user separation. Consider K transmit devices having N T antennas each that are communicating with their respective K receive devices having N R antennas each. Such a channel model assumes that all transmitting devices are communicating with their respective receivers while generating interference to all other receivers. Finding the exact capacity region of a general K user interference channel is still an ongoing effort [106], [107]. For the two users scenario, capacity bounds have been discussed in [108], [109] and references within, under various assumptions regarding interference such as Z interference channels (where one of the two receive users does not experience interference), strong interference, and noisy interference. The term 'noisy interference' refers to conditions where the sum capacity can be achieved by treating interference as noise [110]. For a two users MIMO IC, [110,Theorem 2] presents a set of conditions involving the channel and transmit covariance matrices which are sufficient for interference to be treated as noise. Such conditions ensure that the indirect links are much weaker than the direct links, and power allocation is such that the received power of the message received via the indirect link at each user is very low compared to the received power of the message via the direct link. Further, it is shown in [111] that under strong interference, each receiver can jointly decode the signal and the interference to achieve the sum capacity. As an extension of two users case, [112] derives the conditions under which treating noise as interference can achieve the sum capacity for a K users IC. Most of these works assume no coordination among the source transmitters or the destinations. If different transmitting users coordinate for transmission, and receivers coordinate for reception then interference can be treated as information bearing entity which can be extracted using the tensor framework.
The system model for a MIMO interference network is given by [113]: for k = 1, . . . , K and x (k) ∈ C N T ×1 is the vector transmitted by source k. Also, y (k) , n (k) ∈ C N R ×1 are the received signal and noise vectors at destination k. Matrix H (k,k) ∈ C N R ×N T is the direct channel between source k and destination k and H (k,u) ∈ C N R ×N T is the crosschannel matrix between source u and destination k. For each transmitting source, there is an individual power constraint defined as tr(Q (k) ) ≤ P k , where Q (k) ∈ C N T ×N T is the covariance matrix of vector x (k) and P k denotes the power budget. Such interference networks can be thought of as a tensor communication link. The input can be represented as a matrix X ∈ C N T ×K where each x (k) forms a column of the matrix X. Similarly the received signal and noise can be represented using matrices Y, N ∈ C N R ×K where each y (k) and n (k) form columns of the matrices Y and N. The overall channel between such an input and output can be represented as a fourth order tensor H ∈ C N R ×K×N T ×K where H :,k,:,u = H (k,u) . Subsequently the interference network system model can be represented in tensor form as : Note that (118) is different from MIMO MAC specified by (103) in the sense that in (103) the channel is a third order tensor and thus the output is a vector. On the other hand in (118) the channel is a fourth order tensor to account for user separation at the receiver side as well and thus the output is a matrix or order 2 tensor. The power constraints on the input can be defined in a similar way as for (103). Assuming the noise covariance is an identity tensor I of size N R × K × N R × K, the tensor formulation can be used to specify the channel capacity as Note that capacity obtained from such a tensor formulation assumes that all the sources cooperate for transmission and all the destinations cooperate for reception. Now we consider an example with two users interference channel and compare the sum rate achieved using the tensor framework which assumes user cooperation, with the upper VOLUME 4, 2016 bound on rate suggested in [108] while treating interference as noise. We consider the same example from [108] consisting of a system composed of two transmitters and two receivers, equipped with N T antennas each. The model introduces a positive scalar a ≥ 0 to control the interference power : Such a system of equations can be equivalently represented using (118) where the channel H is a tensor of size N T × 2 × N T × 2 and the input, output and noise are matrices of size N T ×2 each. We find the capacity of such MIMO Interference channels with user coordination using the tensor framework where the optimal input covariance is approximated using Algorithm 1. For our example, in (121), (122) we take a = 1/ √ 3 and the channel entries are i.i.d. zero mean unit variance Gaussian random variables as in [108]. The results are averaged over 100 different channel realizations. These channel realizations are known at the transmitters and receivers. We find the sum rate achieved via the tensor framework assuming user cooperation, and denote it using R T . We compare R T with the capacity of K parallel non-interfering channels found using standard water-filling approach and denote it as R U . In [108], R U has been used as an upper bound on the achievable sum rate while treating interference as noise. Figures 20-24 presents a comparison between R T and R U . To calculate R U , we find the transmit covariance matrix Q (k) for each user input x (k) based on MIMO water filling corresponding to the channel H (k,k) , and set R U = k log det(I + H (k,k) Q (k) H (k,k)H ). Also, R T is calculated using the objective function in (119) where the optimal covariance tensor Q is approximated using Algorithm 1. Figure 20 compares R T and R U for a 2 users case with different values of power constraints P 1 = P 2 = P . The achievable sum rates with the two approaches are plotted against the number of antennas N T . The sum rates for both the cases increase as N T increases. The achievable rate with the tensor approach, R T is higher than the upper bound on sum rate, R U from [108]. It can also be observed in Figure  20 that as P increases, the gap between R T and R U increases as well. The tensor approach shows that the presence of interference can in fact give higher achievable sum rates if the transmit and receive operations are performed jointly by all the transmitting and receiving users respectively. Hence the sum rate achieved via the tensor approach, that allows cooperation at the transmitter and receiver sides, can be higher than the sum rate achieved in the absence of interference.
This feature is further observed in Figure 21 which presents the sum rate against the number of users K for a multi-user MIMO interference scenario from (117) with N T = 2. Each users' receive signal contains a desired signal and information from K − 1 interfering links whose power is controlled by a scalar factor a as described in the two users case. The result presented is for a = 1/ √ 3 and for two different total power budgets P = 5, 10 dB with individual power constraints as P k = P/K. It can be seen that R T is always larger than R U . Furthermore, as the number of users grows, the difference between R T and R U also widens, which shows the advantage of considering interference as information bearing term rather than noise, through the tensor framework.
To further elaborate on the role of interference, Figure 22 presents the sum rate R T of the two users scenario achieved using the tensor framework for different interference power and compares it with R U . The solid lines represent R T and dashed lines represent R U . The result is plotted against the interference coupling power gain defined as G 1 = 10 log 10 a 2 dB [108] for different number of antennas N T at each device, and with power constraints P k = P/K for k = 1, 2 where K = 2 and P is set to 10 dB. It is seen clearly that higher interference leads to higher sum rate R T using the tensor approach. Also, R T increases with N T and the gap between sum rate for different number of antennas widens with increasing interference. However, R U is always lower than R T and does not change with G 1 . Since R U is calculated by assuming zero interference, it does not vary with changing the interference power. At very low interference power, we see that R T and R U are almost same. The difference between R T and R U starts to be significant (more than 1 bit/channel use) at an interference power G 1 of approximately -2 dB when N T = 2, and around -6 dB when N T = 8. Next we consider a three users system specified by: Such a system model can be represented using (118) with Y, X and N as N T × 3 matrices each and H as an N T × 3 × N T × 3 tensor. The number of antennas at each device is denoted by N T . Each destination user receives signals from a direct link and 2 interfering links whose power is controlled by scalar factors a and b respectively. In Figure  23 we present R T and R U against interference power with three users for N T = 2. The power constraints for each user is same, i.e. P k = P/K where K = 3, and P is set to 5 dB. In Figure 23 we have G 1 = 10 log 10 a 2 dB and G 2 = 10 log 10 b 2 dB, where G 1 , G 2 represent the strength of the 2 interfering links for each user. It can be seen that R T is low when the interference power of both links is weak. With increasing strength of the interfering links, we get higher sum rate R T . The curve for R U does not vary with change in G 1 and G 2 and is always lower than R T . Similar observation can be made in Figure 24 which represents R T and R U for N T = 4. On comparing Figures 23 and 24, we see that R T and R U increases as N T increases from 2 to 4. Notice that difference in R T for N T = 2 and 4 is wider for large values of G 1 , G 2 , i.e. when the strengths of the interfering links increase. The tensor framework allows to treat interfering terms as information bearing components, resulting in higher rates with increasing G 1 , G 2 and N T .

C. COMPLEXITY COMPARISON
The matrix based methods used for estimating the sum capacity for MIMO MAC and IC often treats interference as noise. In most cases, the objective of such simplification is a reduction in the computational complexity of the problem. Hence it is important to compare the complexity of the matrix based methods with the tensor method presented in this paper to highlight the relevance of both the approaches.
In section V-C, we presented the complexity analysis of employing Algorithm 1 for any given tensor channel of size In this section, we will use the results from section V-C and compare the computational complexity of finding capacity for the specific cases of MIMO MAC and IC channels using the tensor method with the matrix based methods.

1) Comparison for MIMO MAC
Consider the system model for K users MIMO MAC from (100) where H (k) denotes the uplink channel matrix of size N R × N T between the BS equipped with N R antennas and the kth user which has N T transmit antennas. The iterative water-filling approach of finding the sum capacity computes a separate covariance matrix for each user while treating the interference from all other users as noise. A standard waterfilling approach is used in finding the individual covariance matrices. Furthermore, all the K covariance matrices are  iteratively updated until a convergence criteria is met. The dominant step in a standard water-filling approach is the singular value decomposition of the channel matrix. The computational complexity of the SVD of an m × n matrix is given by O(m · n · min(m, n)) [114], [115]. Thus, the computational complexity of computing the water-filling corresponding to the matrix H (k) ∈ C N R ×N T would be given as O(N R · N 2 T ) (assuming N R > N T ). For the MIMO MAC, since the water-filling is computed for each of the K users iteratively, the complexity scales linearly with the number of users K, and the number of required iterations per user, denoted by I. Thus the overall complexity is given as O(K · I · N R · N 2 T ). On the contrary, the tensor approach in Algorithm 1 is a non-iterative method to approximate the joint covariance of all the users, which also takes into account the interference as information bearing entities. Consider the tensor model for MIMO MAC from (103) where the channel is defined as a third order tensor of size N R × N T × K. Thus based on the discussion in section V-C, the complexity of employing Algorithm 1 for the MIMO MAC channel is given by O((K · N T ) 2 · N R + (K · N T ) 3 ). Assuming that K · N T ≥ N R , we can write the computational complexity as O((K ·N T ) 3 ). The complexity of the matrix and tensor approaches are listed in Table 3 for a clear comparison.
It can be seen that the complexity of tensor approach is cubic in the number of users as opposed to the matrix based iterative water-filling where the complexity is linear in the number of users. Although it is to be noted that the matrix based approach also depends on the number of iterations. Within each such iteration, a standard water-filling algorithm for a matrix channel is executed. In contrast, the tensor approach presented in this paper is non-iterative, i.e. a single iteration is required irrespective of the number of users and antennas.
However, clearly the advantage of the matrix approach should not be dismissed as it offers a low complexity solution. Despite the iterative nature of the matrix approach, the complexity is linear in the number of users as opposed to the tensor approach where it is cubic. However, as the number of users grows the advantage of tensor based approach becomes significant in terms of the achievable sum rate possible due to user coordination. This is evident in Figure 18 where the sum capacity of the coordinated users shows a substantial increase as compared with the independent users, when the number of users grow. For instance, at total transmit power of 10 dB, the sum capacity for 10 users is 40 bits/channel use as opposed to around 32 bits/channel use for the independent users as seen in Figure 18. To better quantify this gap, for the example of Figure 18 we plot the difference between the sum rates achievable with tensor framework and the matrix approach in Figure 25 against number of users and different values of N T at P = 10 dB. We can see that the difference increases substantially with increasing users, and also increasing number of transmit antennas. To summarize, Table 3 shows the advantage of the matrix approach in terms of lower complexity, and Figure 25 shows the advantage of the tensor approach in terms of higher rates due to user coordination.

2) Comparison for MIMO IC
Now let us consider the case of MIMO IC for K users from (117). To calculate R U , which is the upper bound on the achievable sum rate while treating interference as noise, K separate transmit covariance matrices have to be calculated corresponding to each user. Thus, the complexity in this case is also linear in the number of users. Each user solves a waterfilling solution corresponding to a matrix channel H (k,k) of size N R × N T . Assuming N R = N T = N , the complexity is given as O (K · N 3 ). The corresponding tensor model for K users MIMO IC is given by (118) where the channel is a fourth order tensor of size N × K × N × K. The process of calculating R T , which denotes the achievable sum rate using user coordination, requires Algorithm 1 to be executed for the fourth order tensor channel. Thus, based on the discussion in section V-C, its computational complexity is given by O ((K · N ) 3 ). The complexity of the matrix and the tensor models are listed in Table 4 for a comparison. Ignoring the interference leads to a degenerate situation  where a higher order channel is reduced to K non-interfering matrix channels. Thus, the computational complexity of calculating R U is lower when compared with the tensor method used to find R T , as seen in Table 4. However, as shown in Figure 21, the difference between R T and R U can be very large with increasing the number of users. For instance, for 10 users at 10 dB total power, R T is around 19 bits/channel use as opposed to R U which is only around 10 bits/channel use as observed in Figure 21. To better quantify the difference between R T and R U , we consider the example from Figure  21 for P = 10 dB, and plot the difference R T − R U against number of users for different values of N in Figure 26. It can be seen in Figure 26 that this difference increases significantly as the number of users, or the number of antennas increases. Thus, Figure 26 clearly shows the advantage of the tensor framework in providing higher achievable sum rates.

3) Advantage of the tensor approach
In both the MIMO MAC and MIMO IC, the achievable sum rates are interference limited using the matrix approaches. Hence as the number of users or antennas grow which leads to higher interference, the performance degrades in terms of achievable sum rates. It is clear that the tensor method should be preferred when user coordination is allowed leading to VOLUME 4, 2016 a joint transmission across all users, whereas the matrix based methods can be used when users are constrained to act independently and transmit on a per user basis. The tensor method while admitting higher complexity, as seen in Tables 3 and 4, leads to much higher achievable sum rates also, as seen in Figures 25 and 26. Thus there is an inherent performance/complexity trade off where at the cost of higher complexity, much significant gain in sum rates can be achieved especially in the presence of strong inter-domain interference.
The representation of such multi-domain systems using matrix channels and vector signals often gets more complex as the number of domains or dimensions within a domain increases. It is important to note that the tensor method does not necessarily yields lower complexity for such high complex situations. Its main feature is that it provides a structured mathematically accurate framework to treat high complex systems, while having the capability to naturally scale back to lower complexity when the problem is of reduced complexity. For example, for a single user MIMO, the tensor approach scales back to a conventional matrix approach. For the MU MIMO MAC with additional constraint in (109), the tensor framework reduces to the conventional MU MIMO MAC model. Hence the tensor method can be seen as an umbrella set up wherein several other low complexity approaches act as specific cases with additional constraints imposed for simplifying the problem.

VIII. CONCLUSIONS
In this paper, we introduced a unified framework using tensors to represent a multi-domain communication system. The proposed framework can be used to model many of the currently used multi-domain approaches in communications by leveraging the multi-linear structure of the system. Several examples of such systems were included in this paper to demonstrate the usefulness of the proposed tensor framework. The information-theoretic analysis for a fixed tensor channel presented in this paper suggests an exponential increase in Shannon capacity with increase in number of domains, if the channel is known at both transmit and receive side. In particular, the tensor framework's ability to mathematically represent a family of power constraints is utilized. This allows to characterize the capacity of multiuser MIMO systems having per user power constraints for any number of users. A tensor representation of the channel and its information theoretic analysis using the proposed framework leads to a joint transmission scheme across all the domains. An example of MIMO GFDM system was presented to illustrate that such a joint domain processing can lead to better BER performance as compared to per domain processing. It was also shown that capacity of tensor channels decreases as the correlation among the channel components increases. Furthermore, for the same total power as the number of domain elements under individual power constraints increases, the capacity decreases. In case of multiuser MAC and IC channels, it was shown that the tensor approach leads to higher achievable sum rates with usercoordination as compared to the sum rates achieved with independent users. The tensor framework allows to capture the user cooperation in the form of a joint covariance tensor across all the domains. Our results show that such an improvement in achievable rates through the tensor approach becomes even more significant as the number of users grow or the power of the interfering links increases. With independent user transmissions, the interference is treated as noise, as opposed to the tensor approach where interference is treated as information bearing entity. The performance advantages resulting from using the tensor framework come very often at the expense of a complexity increase. However, tensors provide structured and mathematically accurate tools of handling such complexity increase. .

APPENDIX A SOLVING THE EQUATIONS DERIVED FROM KKT CONDITIONS FOR THE OPTIMAL COVARIANCE TENSOR
In this appendix, we present the solution to the equations obtained through KKT conditions for finding the optimal transmit covariance tensor Q. The results presented here are a generalization of Theorem 1 from [73] to a tensor setting. In this appendix H represents the channel tensor, M represents the Lagrange multiplier corresponding to the semidefinite constraint on covariance and B represents the tensor whose pseudo-diagonal entries are the Lagrange multipliers corresponding to other constraints on transmit covariance (such as power). For sum power constraint, B is an identity tensor and the Lagrange multiplier is a scalar µ > 0. So for M 0, B 0, our objective is to find Q 0, that satisfies (126) and (127) Since V A = V P we see that (135) represents the tensor EVD of P H * M P. Hence the middle term in (135) is pseudodiagonal. So U H A * N V K must also be a pseudo-diagonal tensor. Let S U H A * N V K , then since U A and V K are unitary, we get Since S is pseudo-diagonal, (136) implies S = I N ⇒ U A = V K , proving the proposition.
From the tensor SVD of A and K, and Proposition A.1, the left-hand side of (131) can be written as (137)  (138) Equations (137) and (138) represent the tensor EVD of the left and right hand side of (131), hence from uniqueness of tensor EVD we get U = V K and : Let the pseudo-diagonal elements of D A , D K and D BM be a i1,...,i N , k i1,...,i N and m i1,...,i N respectively. Pseudodiagonal elements of D H A and D H K will also be a i1,...,i N and k i1,...,i N respectively as these are real values. Since both sides of (139) are pseudo-diagonal, hence (139) can be written component-wise as : where (z) + = max{0, z}. From (143) and Proposition A.1 we get where V K andD are obtained from tensor EVD of K H * M K = V K * ND * N V H K . Based on the tensor SVD of K in Proposition A.1, we haveD = D H K * M D K . Substituting (144) into (130), we can conclude that