Generalized Framework for Hybrid Analog/Digital Signal Processing in Massive and Ultra-Massive-MIMO Systems

The conventional fully-digital implementation of massive-MIMO systems is not efficient due to the large required number of radio-frequency (RF) chains. To address this issue, hybrid analog/digital (A/D) beamforming was proposed and to date remains a topic of ongoing research. In this paper, we explore the hybrid A/D structure as a general framework for signal processing in massive and ultra-massive-MIMO systems. To exploit the full potential of the analog domain, we first focus on the analog signal processing (ASP) network. We investigate a mathematical representation suitable for any arbitrarily connected feed-forward ASP network comprised of the common RF hardware elements in the context of hybrid A/D systems, i.e., phase-shifter and power-divider/combiner. A novel ASP structure is then proposed which is not bound to the unit modulus constraint, thereby facilitating the hybrid A/D systems design. We then study MIMO transmitter and receiver designs to exploit the full potential of digital processing as well. It is shown that replacing the linear transformation in the digital domain with a generic mapping can improve the system performance. In some cases, the performance of optimal fully-digital MIMO systems can be achieved without extra calculations compared to sub-optimal hybrid A/D techniques. An optimization model based on the proposed structure is presented that can be used for hybrid A/D system design. Specifically, precoding and combining designs under different conditions are discussed as examples. Finally, simulation results are presented which illustrate the superiority of the proposed architecture to the conventional hybrid designs for massive-MIMO systems.


I. INTRODUCTION
Massive-multiple-input multiple-output (MIMO) and (ultramassive) UM-MIMO systems operating in millimeter wave (mmW)/Terahertz (THz) bands are the prime candidates for fifth generation (5G) and beyond 5G cellular networks [1]- [4]. In fact, base-stations (BS) with 64 antennas have been recently deployed for commercial use in some countries [5]. Moreover, an extensive theory for massive MIMO has been developed in recent years, including capacity The associate editor coordinating the review of this manuscript and approving it for publication was Miguel López-Benítez . and spectral efficiency analysis, system design for high energy efficiency, pilot contamination, etc. However, implementation of such systems faces many technical difficulties, and to this day remains very challenging and costly [6], [7]. In conventional fully-digital (FD) MIMO systems, each antenna element requires a dedicated radio frequency (RF) chain. The direct FD implementation for massive-MIMO/UM-MIMO systems, however, is not practical and efficient due to the ensuing high production costs and more importantly, huge power consumption.
Hybrid analog/digital (A/D) signal processing (HSP) is an effective approach to overcome this problem by cascading an analog signal processing (ASP) network to the baseband digital signal processor [8], [9]. While in conventional FD MIMO transmitters [10]- [12], each antenna element is directly controlled by the digital processor, in an HSP-based transmitter, the digital processor generates a low-dimensional RF signal vector, whose size is then increased by analog circuitry for driving the large-scale antenna array. Similarly, in an HSP-based receiver, the size of the large-dimensional vector of antenna signals is reduced by an ASP network, whose outputs are then converted to the digital domain for baseband processing by means of RF chains.
There are practical constraints in the implementation and design of ASP networks and only a few types of RF components are commonly used in practice. Specifically, the power-divider (splitter), power-combiner (adder), and phase-shifter are the key analog components of the ASP design [13]- [22]. In the existing hybrid beamforming structures, due to the particular configuration of the aforementioned analog components, a constant modulus constraint is imposed on the analog beamformer weights which turns the beamforming design into an intractable non-convex optimization problem [13], [14].

A. RELATED WORKS
In one of the earliest works in this field [8], it is shown that for a single data stream, two RF chains are required to achieve the performance of a FD combiner. This technique was extended to multiple stream beamforming (i.e., precoding/combining) where the required number of RF chains must be twice the number of the data streams [14], [15]. In [23], [24], we proposed a single RF chain FD precoding realization. Many researchers, however, focused on developing the hybrid beamformers directly by solving non-convex design optimization problems [13]- [21].
In [13], the beamformer design was formulated as the minimization of the Euclidean distance between the hybrid beamformer and the FD one. Then, by taking into account the sparse characteristics of the mmWave channels, compressed sensing (CS) techniques were presented to solve the underlying optimization problems. The same authors, extended their results to wide-band systems in [25]. This approach was later used in [21] and [16] where in the latter, manifold optimization algorithms as well as other low-complexity algorithms were used for hybrid beamformer design. Directly tackling the non-convex design optimization problems was attempted in [14] where the authors took advantage of orthogonalization techniques and exploited the sparsity of the channel for designing the hybrid beamformers. These results were then extended to wide-band systems in [26]. In [27], the Gram-Schmidt method was used specifically in uplink multi-user (MU) scenario for designing robust low-complexity beamformers. Robust beamformers for single-user (SU) were studied in [18] by minimizing the sum-power of interfering signals. In [17], a simple non-iterative algorithm was proposed for hybrid regularized channel diagolnalization and in [27] the mean square error (MSE) was chosen as the cost function for designing the hybrid beamformers.
The majority of the above works consider a fully-connected architecture, i.e., each RF chain is connected to all of the antenna elements. Alternatively, in a sub-connected architecture, only a subset of RF chains are connected to each antenna [16], [28], [29]. Recently, a dynamic sub-connected hybrid architecture has been proposed in [29] for multi-user equalization in wideband millimeter-wave massive MIMO systems, based on the minimization of the sum of MSE over multiple subcarriers. Although sub-connected designs require less RF components, fully-connected ones can achieve a superior performance in theory. Hence, in this study, we investigate properties of fully-connected ASP networks.

B. CONTRIBUTIONS AND PAPER ORGANIZATION
In this paper, our goal is to investigate and exploit the full potential of HSP in massive-MIMO systems. Aiming at this challenge, we can summarize our contributions as follows • We first explore the degrees of freedom in the analog domain by developing a compact mathematical representation for any given feed-forward ASP network with arbitrary connections of any number of RF components, i.e., phase-shifters, power dividers and power combiners.
• Based on the above generalization, a simple and novel ASP architecture is conceived out of the above RF components, which is not bound to the constant modulus constraint. Removing this constraint facilitates system design as non-convex optimizations are difficult to solve and global optimality of the solutions cannot usually be guaranteed.
• The transmitter and receiver sides are then studied separately by exploiting the newly proposed ASP architecture and generalizing generalizing the digital processing. Specifically, the optimization problem for the HSP beamformer is reformulated within the new representation framework, which facilitates its solution under a variety of constraints and requirements for the massive MIMO system.
• The realization of optimal FD by HSP and the problem of RF chain minimization are presented as guideline examples to illustrate potential applications of the proposed theoretical framework.
• Simulation results of optimal beamformer designs with the proposed architecture are finally presented. The results demonstrate that the new designs can achieve the same performance as the corresponding optimal FD system and hence, outperform recently published hybrid beamformer designs.
The paper is organized as follows. In Section II, the system model is explained. We then study ASP networks in Section III followed by transmitter and receiver design in Sections IV. Simulation results are presented in Section V. We then conclude the paper in Section VI. Notations: Throughout this paper we use bold capital and lowercase letters to represent matrices and vectors, respectively. Superscripts (.) H , (.) t , and (.) * indicate Hermitian, transpose, and complex conjugations, respectively. I n denotes an identity matrix of size n×n while 0 n×m denotes an all zero matrix with size n × m and 1 n is an all one column vector of size η. The element on the p th row and the q th column of matrix A is denoted by A p,q while the p th element of vector x is denoted by x p . Tr(A) and A F denote trace and Frobenius norm of matrix A, respectively. A = bd(A 1 , A 2 , . . . , A n ) represents a block-diagonal matrix, in which A 1 , A 2 , . . . , A n are the diagonal blocks of A. The Kronecker product is denoted by ⊗. By x 1 π = x 2 , it is meant that there exist a permutation matrix P π such that x 1 = P π x 2 . The greatest (least) integer less (greater) than or equal to x is denoted by x ( x ). Moreover, x = a mod n denotes the remainder of the division of a by n. The absolute value and phase of a complex number z = |z| exp(j z) are denoted by |z| and z. C stands for the complex field. A complex circular Gaussian random vector x ∈ C n with mean vector m = E{x} and covariance matrix R = E{xx H } is denoted by CN(m, R) where E{} stands for expectation.

II. SYSTEM MODEL
We consider a generic point-to-point massive-MIMO system where the transmitter and receiver are equipped with N T and N R antennas as well as M T and M R RF chains, respectively. In the context of HSP, due to practical constraints, it is further assumed that M T N T and M R N R .
A. CONVENTIONAL HYBRID BEAMFORMING Fig. 1 illustrates a point-to-point massive-MIMO system with conventional hybrid beamforming implemented at both ends. The transmitted signal over one symbol duration T s can be formulated as where s = [s 1 , s 2 , . . . , s K ] t is the symbol vector with zero-mean random information symbols s k 's taken from a discrete constellation A (such as M-QAM or M-PSK), normalized such that E{ss H } = I K and, ρ is the average transmit power. Matrices P D ∈ C M T ×K and P A ∈ U N T ×M T are the  digital and analog precoders, respectively, where U = {z ∈ C : |z| = 1} and for normalization purposes, it is further assumed that P A P D 2 F = 1. The received signal can then be written as where H ∈ C N R ×N T is the MIMO flat fading channel matrix such that E{ H 2 F } = N T N R and n ∼ CN(0, σ 2 I N R ) is an additive white Gaussian noise (AWGN) vector. The decoded symbols after hybrid processing can be expressed aŝ where D D ∈ C K ×M R and D A ∈ U M R ×N R are the digital and analog combiners, respectively.

B. GENERALIZED HSPSYSTEM FORMULATION
In this work, we consider a more general formulation for HSP that extends the cascaded structure of analog and digital linear transformations presented in Subsection II-A.
We will see that this formulation can in fact bring simplifications to the conventional linear MIMO precoding/combining techniques.
In the generalized HSP-based massive-MIMO transmitter, as shown in Fig. 2, the symbol vector s is first applied as input to the digital signal processor, whose output is a baseband signal vector expressed as where F T : A K → C M T is the corresponding mapping from A K to C M T . Then, M T parallel RF chains convert the  baseband signal vector x BB T into a bandpass modulated RF signal vector x RF T . The latter is next input to the ASP network whose output is the transmit signal vector, which can be expressed as where G T : C M T → C N T is the corresponding mapping. As shown in Fig. 3, the received RF signal y following from the noisy MIMO transmission as in (2) is first applied as input to the ASP network, yielding where where While only a power constraint is imposed on the baseband mappings F R and F T , the RF mappings G R and G T must be implemented by RF analog components which constrain these transformations as discussed in the following section.

III. ANALOG SIGNAL PROCESSING NETWORK
In this section, aiming at exploiting the full potential of the analog domain, we develop a mathematical formulation for the ASP network represented by the RF mappings G T and G R in the previous section. Specifically, instead of focusing on the conventional analog beamformer structure used in the recent literature [13]- [21], we consider an arbitrarily connected network of phase-shifters, power dividers and power combiners. In our developments, signal-flow graph concepts are used which provide valuable insights for analysis of linear networks [30], [31].
Let us start by formally introducing the individual RF components comprising the ASP networks. The input-output (I/O) relationship of a phase-shifter is given by b = e jθ a where a, b ∈ C are the input and output, respectively, and θ ∈ [0, 2π ] controls the phase difference between them. In this work, in order to explore the performance limits of ASP networks and find a compact representation for any arbitrarily connected ASP with common RF components, we consider infinite resolution phase shifters. 1 The passive power combiner and power divider are implemented by the same RF multi-port network but their port configuration is different. For instance, the ideal µ-way Wilkinson power divider is an µ+1 port RF network which can act as an equi-power divider if the input signal is applied to its port 1 and the outputs are taken from ports 2 to µ + 1 [32]. Conversely, it acts as a combiner if the inputs are applied to port 2 to µ + 1 and the output is taken from port 1.
To obtain a unified model for any possible ASP network with M input ports and N output ports using primary modules (i.e., phase-shifter, power divider and power combiner), we first present a convenient multi-port matrix representation of each component. We also include a permutation operation which does not require additional hardware and is used mainly for the sake of mathematical simplification. The I/O relationship of the components are defined below in terms of their input and outputs represented by vector a and b, respectively.
• Single phase-shifter: As illustrated in Fig. 4, for vector a, b ∈ C η , the corresponding η × η matrix only changes the phase of the γ th element of the RF input signal a, which can be expressed as • Single power divider: For input vector a ∈ C η and output vector b ∈ C η , the corresponding η × η matrix divides the γ th element of the input RF signal into µ equi-power signals and the remaining RF branches are not altered, and hence, η = η + µ − 1. As illustrated in Fig. 4b, this operation can be described by a block diagonal matrix b = Q(γ , µ, η)a, (9a) • Single power combiner: This transformation can be represented by the transpose of the single power divider matrix Q(γ , µ, η). Consequently, for input vector a ∈ C η and output vector b ∈ C η the corresponding matrix combines µ adjacent RF signals into the γ th output signal and the rest of the RF branches are not altered. As seen from Fig. 4c, we can write • Permutation matrix: This transformation shown in Fig. 4d corresponds to rearrangement of the elements of vector a ∈ C η into vector b ∈ C η according to a permutation π : {1, . . . , µ} → {1, . . . , µ}. This can be expressed as where P π = [e π 1 , . . . , e π M ] t , and e i denotes a column vector of zeros except for its i th element which is one (see ).
Having introduced a matrix representation of the RF components, we can now seek the mathematical formulation for any given ASP in terms of these matrices. Proposition 1: Any given RF network, with N input and M output ports, implemented by arbitrary feed-forward connections of T RF components (i.e., phase-shifters, power combiners and power dividers) can be modeled as follows where a ∈ C N and b ∈ C M are the input and output RF signals, respectively, and θ i is a 3-tuple containing the parameters of the i th RF component.
Proof: See Appendix A. To illustrate the application of this result, consider the ASP network example in Fig. 5. By using the indexing scheme introduced in the Proof of Proposition 1, this network can be reorganized as a product of basic RF transformations as shown in Fig. 6. Note that in the latter figure, permutation matrices only appear before the 7 th and 15 th RF components; for the remaining components the permutation is an identity matrix (not shown for simplicity). It is worth mentioning that the indexing is not unique and parallel components can be swapped, for instance, the order of u 2 , u 3 and u 4 does not affect the I/O relationship of the ASP network.
Theorem 1: For each one of the following products of two basic RF component matrices on the left, there exists an equivalent matrix factorization as given on the right of the The definitions of the parameters appearing on the right hand side of these identities are given in the proof.
Proof: See Appendix B. Next, we introduce three ASP sub-networks and their compact equivalent representations; these will play a key role in establishing our main results in Theorems 2 and 3.
• Phase-shifter network: This sub-network is obtained by cascading J basic phase shifter matrices (with accompanying permutations) of common size N p , i.e.
where, as illustrated in Fig. 7 E with • Power divider network: By cascading J power divider matrices of compatible sizes, we obtain where, as illustrated in Fig. 7b, which is equivalent to an RF network that divides N d RF signals into a total of M d signals. The presence of the identity matrix in (21) accounts for branches that are not divided.
• Power combiner network: By cascading J power combiner matrices, we obtain, where, as illustrated in Fig. 7c, which is equivalent to an RF network that combines N c RF signals into M c signals. The validity of the identities in (18), (20) and (22) is demonstrated in Appendix C. We can now derive a mathematical expression for the representation of any given ASP network.
Theorem 2: Any arbitrarily connected feed-forward ASP network with M inputs and N outputs, implemented by a total number of T phase-shifters, power dividers, and power VOLUME 8, 2020 combiners can be modeled as where a ∈ C M and b ∈ C N are the input and output signals, respectively, andȖ = {z ∈ C : |z| ≤ 1}. That is, all the entries of matrix A have magnitude less or equal to 1.
Proof: See Appendix D Going back to our previous example in Fig. 5 and Fig. 6, the ASP network in the latter figure can be transformed into that of Fig. 8, for which the 2 × 2 transformation matrix A satisfies the condition of the theorem. Now, we investigate whether any matrix in the convex setȖ N ×M can be realized by an ASP.
Theorem 3: Any given matrix A ∈Ȗ N ×M can be realized by an ASP network with a total number of T = 2MN + M + N RF components, i.e., N dividers, M combiners, and 2NM (unit-modulus) phase shifters, as shown in Fig. 9.
Proof: The output of the ASP in Fig. 9, corresponding to the input vector a, can be expressed as In (25a), since b i is the output of a 2M -way combiner, the normalization factor 1 √ 2M appears from (9). Similarly, the k th input, i.e., a k is divided into 2N branches which according to (9) introduces a normalization factor of 1 where the minimum possible value of L is two, i.e., A ki = 1 2 (e jφ 1 k,i + e jφ 2 k,i ). Therefore, we have: where A ∈Ȗ N ×M . Moreover, 2M phase-shifters are required for each element of b and consequently, a minimum of 2MN phase-shifters are needed. Remark 1: The significant result of Theorem 3, is that any A ∈Ȗ N ×M can be implemented with an ASP structure using conventional RF components, i.e. combiners, dividers and phase shifters, whose input-output relationship is not bound to the unit modulus constraint. That is, while the individual phase-shifter components satisfy this constraint, the overall transformation matrix implemented by the proposed structure in Fig. 9 is no longer restricted to the unit modulus constraint. Thus, the troubling non-convexity constraint found in the literature on hybrid beamforming literature can be lifted from the design optimization problems.
Remark 2: According to the above proof, non-unique solutions for phase-shifter may exist. This additional degree of freedom can be considered when designing the ASP network based on the requirements and constraints of the analog system. By writing A p,q = |A p,q | exp(j Ap, q), one possible solution for φ 1 k,i and φ 2 k,i is given by It is worth noting that in the conventional hybrid structure T = MN + M + N RF components are required [13]- [21]. In contrast, the proposed ASP structure requires MN additional phase-shifters, for a total of T = 2MN + M + N RF components. These additional components, when employed as in Fig. 9, allow to lift the constant modulus constraint for the overall transformation.
Remark 3: It is worth mentioning that since for wide-band systems it is desirable to have a common ASP network for the entire band [25]- [27] the proposed structure can be used for MIMO-OFDM systems. Particularly, since the proposed ASP structure is not bound to constant modulus constraint, it simplifies the design of hybrid MIMO-OFDM beamformers.

IV. TRANSMITTER AND RECEIVER DESIGN WITH GENERALIZED HSP
While the previous section focused on the realization of the RF mappings G R and G T , as defined in (5) and (6), using basic RF components, in this section we turn our attention to the baseband mappings F R and F T as defined in (7) and (4), respectively. To this end, we consider the ASP network in Fig. 9 for G T and G R and consequently, (5) and (6) are replaced by: We first focus on the transmitter and then on the receiver design.

A. HSPDESIGN AT THE TRANSMITTER
Considering (4), (5), and (28a), the transmitted signal of the generalized HSP can be written as follows: In the literature on hybrid beamforming, F T is usually a linear transformation, i.e., x T = √ ρA T Ps, where P ∈ C M T ×K is the precoding matrix. We first explore the properties and implementation of F T , and then discuss the design of F T and A T at the HSP-based transmitter.
Let D T (s) denote the transformation that generates the desired transmitted signal from the given vector symbol s. In effect, this function can represent a generic communication techniques at the transmitter side. For instance, the optimal eigen-mode precoding is obtained by solving the following problem: The solution is given by where the diagonal weight matrix ϒ is calculated via water filling [33] and V is a unitary matrix obtained from singular value decomposition of the channel matrix, i.e., Consequently, for this particular precoding scheme we have Note that nonlinear beaforming, channel estimation, space-time coding and many other techniques can also be represented by D T (s).
From (29), in order to generate the same transmit signal as a given D T (s) via an HSP-based transmitter, we need to find A T and F T (.) such that holds for all symbol vectors s. Hence, since D T (s) is given, F T (s) can be defined as the following set set, or multi-valued function: Note that while it might be very difficult to explicitly construct the mapping F T (.), obtaining its output, i.e., F T (s) is simple because the value of D T (s) is available. In other words, since the output of the HSP-based transmitter is given, i.e. D T (s), it is sufficient to calculate the desired output of F T (.) rather than implementing the mapping itself.
From (4) and (35), we can rewrite (34) as which means that in general the HSP objective is to find A T and x BB T such that (36) is satisfied for the given D T (s). This objective guarantees that the HSP-based system achieves the same performance as the FD one, i.e., D T (s). However, many variations can be derived according to the conditions and constraints of the system, which opens new avenues for investigation in this area.
In practice, depending on the system constraints, one may wish to design A T , x BB T and possibly some other system parameters represented by vector p on the basis of some optimization criterion. For instance, the following generic optimization problem can be used for obtaining the HSP parameters, where C(A T , x BB T , p) represents the system constraints. Alternatively, this could be formulated as where f (.) is the chosen cost function based on the objectives of the system. Note that the power constraint is not necessary as it can be taken into account when designing D T (s). One obvious choice is f (A T , x BB T , p) = 1, in which case A T must be designed such that for some set S ⊂ A K , we have D T (s) ∈ span(A T ), ∀s ∈ S, where span(A T ) denotes the span of A T . Consequently, the baseband signal is obtained from In what follows, we present different cost functions for designing precoding matrices with HSP. VOLUME 8, 2020

1) UNCONSTRAINED FD PRECODING FOR M T ≥ K
For M T ≥ K it is possible to realize any given FD precoder.
As an example, we explore optimal eigen-mode precoding, although any other precoding matrix can be obtained in the same fashion. We first consider the case M T = K and subsequently discuss the modifications needed for M T ≥ K .
From (33) and (36), both A T and x BB T must be designed such that Since A T is of size N T × M T , this problem for M T = K has the following simple solution where p 0 = vec(Vϒ) ∞ .
In the case M T > K , one possible solution that achieves the same performance as the FD precoding is to append M T − K zeros to the solution x BB T in (40b) and set the corresponding columns of A T in (40a) to zero.
Note that no constraint is enforced on the system and similar to existing hybrid solutions in the literature, A T must be updated according to the channel coherence time, denoted as T c in the sequel. Since s changes after every symbol duration T s , x BB T is also updated every T s .

2) UNCONSTRAINED FD PRECODING FOR M T < K
In this case, from using either (38) or (38), it is possible to obtain various hybrid beamformer designs depending on the system requirements. Here, we aim at minimizing the Euclidean distance between the eigen-mode FD precoder in (31) and the hybrid beamforming matrix A T . However, since the former has size N T × M T while the latter has size N T × K , we first find a beamforming matrixÂ T of size N T × M T subject to a rank M T constraint, i.e., we can write the solution for the above problem aŝ Now by defining we can obtain x BB T by solving min which yields

3) MINIMUM NUMBER OF RF CHAINS WITH FAST PHASE-SHIFTERS
If we do not have a constraint on the update rate of the analog components, we can reduce the number of RF chains by solving the following problem This problem is shown to have non-unique solution for M T = 1 where D T (s) = Vϒs in [23] but essentially the same solution is valid for any other transmit function D T (s). Note that in this case the ASP must be updated after every symbol duration T s .

B. HSPDESIGN AT THE RECEIVER
Similar to the previous subsection, let us assume that the ideal FD decoder that maps the received RF signal y into the detected symbolsŝ, represented by the mapping D R (y), is known. Since in massive-MIMO systems beamforming and multiplexing are key techniques, linear detection is of great interest due to its simplicity. In this case, which is considered in our discussion, D R (y) = Zy where Z ∈ C K ×N R is the FD combiner matrix. However, at the price of increased computational complexity, D R (y) can be extended to more sophisticated detectors such as maximum likelihood or sphere decoding. By substituting (6) and (28b) in (7), the estimated signal at the receiver is written as: Clearly, the same approach used in Subsection IV-A for realizing the transformation F T (.) cannot be applied here because the desired output of F R (.) is unknown, i.e., we need this mapping to implement the decoding function. Ideally, we want to find a mapping F R (·) and A R such that or all y. Similar to the HSP literature [13]- [21], we consider linear transformation for the baseband processing, i.e., is the corresponding transformation matrix; however, extension to types of transformations is straightforward by using (48). Consequently, the following generic optimization problem can be considered for obtaining the HSP parameters: where C(A R , W, p) represents the system constraints. Alternatively, this could be formulated as where f (.) is a cost function designed to satisfy the requirements of the system. In what follows, FD combining for point-to-point MIMO is presented as an example.

1) UNCONSTRAINED FD COMBINING FOR M R ≥ K
We first consider the case where M R = K and subsequently discuss the case M R > K . The optimal FD combiner for a point-to-point MIMO can be obtained from From (32), the solution is given by where U = [U a , U b ] and U a contains the first K columns of U, corresponding to the K dominant singular values of the channel matrix H. Thus, A R and W must be jointly designed such that where A R ∈Ȗ M R ×N R . Note that if M T = K , for any FD combiner Z ∈ C K ×N R , this problem has the following solution where p 1 = vec(Z) ∞ . The above design can be extended to the case M R > K , although here including more RF chains adds to the cost and complexity of the system while no improvement is gained. One trivial solution that guarantees the same performance as the FD solution is to set the additional M R − K columns of W to 0, i.e., using W = p 1 The FD realization for the multi-user case can be similarly obtained. First (51) must be replaced by the desired optimization problem for finding the FD combiner. Analog and digital combiners are then calculated by (54).

2) FD COMBINING FOR M T < K
In the case of linear decoding, there must be at least K independent equations to recover K transmitted symbols. Hence, the minimum number of required RF chains is M R = K . Consequently, combiner design for M R < K is not practical in this case.

3) MINIMUM NUMBER OF RF CHAINS WITH FAST PHASE-SHIFTERS
Even with the same assumption as in Subsection IV-A.3, i.e. the phase-shifters can be updated every T s , at least K RF chains are required. Since only the channel matrix is known at the receiver which changes each T c , a faster update rate of the phase-shifters does not provide any extra degrees of freedom and hence does not help in reducing the number of RF chains at the receiver. Consequently, the minimum number of possible RF chains for digital linear combining is M = K .

V. SIMULATION RESULTS
In this section, we present simulation results for different scenarios and compare the FD system with our proposed hybrid architecture as well as existing hybrid designs in the literature.
The following channel models is used for all the simulations, where N c = 5 is the number of clusters, and N ray = 10 is the number of rays in each cluster. Similar to [14], [27], the path gains are independently generated as α ij ∼ CN(0, 1). The transmit and receive antenna responses are denoted by a r (θ r ij ) and a t (θ t ij ) respectively, where Simulation results are presented for the optimal FD precoder and combiner, our proposed hybrid precoder and combiner realization of FD in Subsections IV-A.1 and IV-B.1, as well as selected hybrid designs from [14], [27]. For M RF chains and N antennas, the proposed and the conventional structures require T = 2MN + M + N and T = MN + M + N RF components, respectively.

A. BIT ERROR RATE (BER) PERFORMANCE
BER performance versus SNR (SNR= ρ/σ 2 ) for three different setups is shown in Fig. 10 to 12. Fig. 10 presents the results for a massive-MIMO system with N T = N R = 64 antennas (and M T = M R = 2 RF chains). The downlink BER performance of a massive-MIMO BS with N T = 64 antennas transmitting to a single user with N R = 2 antenna is shown in Fig. 11, while the uplink BER performance for the system is shown in Fig. 12. It can be seen that in all the simulated scenarios the proposed hybrid realization matches the performance of the FD systems while outperforming the existing hybrid designs. The FD systems require M T = M R = 64 RF chains whereas the proposed design achieves the same performance with only 2 RF chains. Consequently, the proposed     signals as the FD system with limited number of RF chains by employing the proposed ASP network. In particular, since the RF output of the proposed structure is identical to that of the desired FD system, the same performance as the optimal FD beamforming can be achieved.

B. SPECTRAL EFFICIENCY
The spectral efficiency (in bits/s/Hz) of optimal FD beamforming, proposed hybrid realizations of FD as well as the hybrid designs from [14], [22], [27] for massive-MIMO system with N T = N R = 64 antennas is shown in Fig. 13. The spectral efficiency of an uplink connection for a single user with N T = 16 antennas and a massive-MIMO BS with N R = 64 antennas is presented in Fig. 14. Furthermore, Fig. 15 shows the spectral efficiency of a downlink connection for a massive-MIMO BS with N T = 64 antennas and a single user with N R = 4 antennas. As expected, the proposed ASP-based realizations achieve the same rate as their FD counterparts and and therefore outperform existing hybrid designs. In order to evaluate the performance of the proposed ASP structure when number of antennas grows larger, simulations are performed for ultra-massive MIMO system configurations. Spectral efficiency versus number of transmitter antennas N T is plotted in Fig. 16 for different number of receive antennas. For the FD system the number of RF chains is equal to the number of transmitter antennas, i.e., M T = N T whereas for the proposed hybrid structure the number of antennas is kept equal to the number of transmitted symbols, i.e., M T = K . It can be seen that in all cases, the hybrid design with the proposed ASP architecture achieves the same performance as the corresponding FD system. For instance, for an ultra-massive MIMO transmitter with N T = 1024 antennas and receiver with M T = 2 antennas, the FD structure requires N R = 1024 RF chains while the proposed structure guarantees the same performance with M T = 2 RF chains.

C. COMPUTATIONAL COMPLEXITY
The proposed ASP architecture is implemented with the same RF components as the conventional hybrid structures [13]- [21]. Moreover, since the constant unit modulus is not imposed on the entries for the resulting analog transformation matrix with our approach, the computational complexity of designing the analog and digital beamformers can be reduced. Compared to the FD system design, the additional computations required for the proposed ASP approach lie in the calculation of the phase-shifter parameters as given in (27). In the case of an eigen-mode FD beamformer for instance, the calculations in (27), in terms of complexity order, are dominated by the SVD and water filling algorithm needed for FD design, as represented by (31). Moreover, existing hybrid designs use sophisticated optimization or reconstruction techniques to handle the constant modulus constraint. For instance, the iterative algorithms in [14] and [27] require matrix inversion in each iteration. Consequently, the computational complexity of the proposed FD realizations with ASP is less than each iteration in these hybrid designs. for proposed and FD, with different numbers of rece. . .

VI. CONCLUSION
In this paper, we investigated the hybrid A/D structure as a general framework for signal processing in massive and ultra-massive-MIMO systems. We first explored the ASP network in details by developing a mathematical representation for any arbitrarily connected feed-forward ASP network comprised of phase-shifters, power-dividers and power combiners. Then, a novel ASP structure was proposed which is not bound to the unit modulus constraint. Subsequently, we focused on the transmitter and receiver sides by exploiting the newly proposed ASP architecture and generalizing generalizing the digital processing. Specifically, the optimization problem for the HSP beamformer was reformulated within the new representation framework, which facilitates its solution under a variety of constraints and requirements for the massive MIMO system. Finally simulation results were presented illustrating the superiority of the proposed architecture to the conventional hybrid designs for massive-MIMO systems.

Proof of Proposition 1:
The matrix representation of the RF components in (8)- (10) are introduced such that the input and output signals can be of any size and thus can include RF branches that are not affected by the RF component. Consequently, we can sort the RF components such that the input of each RF component is the output of another RF component except for the first component. Let us denote the input and output of the i th RF component as a i and b i , respectively. Consequently, we have b i−1 = a i , a 1 = a and b = b T . To be more precise the following algorithm is used to assign the index i for i = 1, 2, . . . , T to each RF element: Note that step 1 has always an answer because of how a i and b i are defined. Moreover, it is possible that more than one RF component satisfy the condition in step 1. VOLUME 8, 2020 for i ← 1 to T do 1. Find an RF component whose input is a i ;

Assign index i to that RF component; 3. Denote the output as
In these cases, the components are parallel, i.e, the signals are simultaneously entering them and any ordering of these components is acceptable. Now, for i = 1, 2, . . . , T we can write b i in terms of a i . If the i th RF component is a phaseshifter, a power divider, or a power combiner, then we have b i = (γ , φ, η)P π a i , b i = Q(γ , µ, η)P π a i , or b i = Q t (γ , µ, η) t P π a i , respectively. Note that, if the order of the signals is not changed before the i th component, we have P π = I. Hence, the given ASP can be expressed as in (12).
APPENDIX C 4) PROOF OF (18) To show that E v is a diagonal matrix, we can use induction and the fact that for a diagonal matrix D and permutation matrix P π , the matrixD = P π DP π t is also a diagonal matrix. For J = 1 the statement is true, and we must prove for J = K + 1 we have: By assuming for J = K , matrix E v is diagonal and P π is a permutation matrix, we can rewrite the above equation aŝ We can therefore writê Using the aforementioned property of permutation matrices, we knowˆ = P π (γ K +1 , φ K +1 , N p )P t π is a diagonal matrix. We further know thatP π = P π P π K +1 is a permutation matrix thus we can write:Ê vPπ = E vˆ P π . (81) Since E v andˆ are both diagonal so isÊ v . Furthermore, since all the diagonal entries are unit modulo complex numbers their products are also on the unit circle and thus v = [e jφ 1 , e jφ 2 , . . . , e jφ Np ] t ∈ U N φ . (20) Induction can be used to prove this statement. For J = 1, one can easily find P π 1 such that

5) PROOF OF
accordingly there exist P π 2 such that P t using the fact that permutation matrices are orthogonal, we can write Q(1, µ, η) = P π 1 bd( 1 √ µ 1 µ , I η )P π 2 . Now, let VOLUME 8, 2020 us assume for J = K we have: We can thus write the following for According to the J = 1 case, there exist P π 3 and P π 4 such that thus, Let us first define P π 5 = P π P π 3 , then by considering D d P π 5 , the permutation matrix P π 5 rearranges the columns of D d . Therefore, there exist permutation matrix P π 6 that rearranges the rows of D d P π 5 to make a block diagonal matrix where It is possible that δ i = 1 for individual i or some consecutive number indices which result it diagonal block of identity matrices I. From (87) and (88), we arrive at P π D d P π Q (γ K +1 ,µ K +1 ,η K +1 ) = P π P π 6 D d bd( The above equation can be further simplified as . Therefore by defining P π 7 = P π P π 6 , and from (85) and (89) we have which proves the statement. 6) PROOF OF (22) For J = 1, we have to show that C d has the block diagonal structure of (23) in P π C d P π = Q t (γ , µ, η)P π 1 . According to (83), we can write Q t (γ , µ, η)P π 1 = P π bd( 1 √ µ 1 µ , I)P π P π 1 . Since the product of two permutation matrices is also a permutation matrix, we have: P π = P π P π 1 . To continue the proof with induction, we assume that for J = K there exist P π ,P π and C d such that P π C d P π = J j=1 Q t (γ j , µ j , η j )P π j . Now, for J = K + 1 we can write: K +1 j=1 Q t (γ j , µ j , η j )P π j = Q t (γ 1 , µ 1 , η 1 )P π 1 P π C d P π . (91) According to the J = 1 case, there exist P π 1 and P π 2 such that Be defining a new permutation matrix P π 3 = Pπ 2 P π 1 P π , we can write the left hand-side of (91) as: Considering P π 3 C d , permutation matrix P π 3 rearranges the rows of C d . Therefore, there exist permutation matrix P π 4 that rearranges the columns of P π 3 C d to make a block diagonal matrix where ). Note that it is possible that δ i = 1 for individual i or some consecutive number indices which result it diagonal block of identity matrices I. From (93), (94) and the fact that P π 4 P t π 4 , we can write To further simplify the above equation, we can write bd( 1 To take the last step, there exist permutation matrices P π 5 and P π 6 such that P t = C d where for some L, we can have From the above equation and (95) we have P π 1 bd( 1 √ µ 1 1 µ 1 , I)P π 3 C d P π = P π 1 P π 5 C d P π 6 P t π 4 P π . (98) Now by defining permutation matrices P π 7 = P π 1 P π 5 , P π 8 = P π 6 P t π 4 P π and from (91) to (98), we arrive at K +1 j=1 Q t (γ j , µ j , η j )P π j = P π 7 C d P π 8 , which proves the statement.

APPENDIX D
Proof of Theorem 2: Without loss of generality, let us assume there are a total of T RF components, i.e., P combiners, R dividers repectively, and Q phase-shifters, so that T = P + Q + R. According to properties (14), (15) and (17) in Theorem 1, we can rewrite (12) as follows by commuting the combiner matrices to the left hand side, i.e, where T = T − P. Similarly, the divider matrices can be moved to the right hand side using properties (13), (15) and (17), thus, where T = T −P−R. In (101) only the permutation and single phase-shifter matrices are in the middle of the expression. Therefore, without loss of generality and due to the fact that permutation and single phase-shifter matrices can be identity matrices we can write: Now, using (18), (20), (22), and the fact that product of permutation matrices is another permutation matrix, we have which follows, b = P π 1 C d P π 2 E v P π 3 DdP π 4 a.