Low-Complexity Equalisers for Offset Constellations in Massive MIMO Schemes

Massive multi-input–multi-output (m-MIMO) schemes require low-complexity implementations at both the transmitter and the receiver side, especially for systems operation at millimeter wave (mmWave) bands. In this paper, we consider the use of offset constellations in m-MIMO systems operating at mmWave frequencies. These signals are designed to have either an almost constant envelope or be decomposed as the sum of constant-envelope signals, making them compatible with strongly nonlinear power amplifiers, which can have low-implementation complexity and high amplification efficient, making them particularly interesting for mmWave communications. We design and evaluate low-complexity frequency-domain receivers for offset signals. It is shown that the proposed receivers can have excellent performance/complexity trade-offs in m-MIMO scenarios, making them particularly interesting for future wireless systems operating at mmWave bands.


I. INTRODUCTION
The evolution towards the next wireless communications systems (5th Generation (5G) and beyond) faces multiple challenges. These new systems should be able to cope with applications as diverse as Internet for Things (IoT), autonomous driving cars, remote surgery or augmented reality while improving the data rate and the availability of the previous generations [1]. In fact, it is expected a massive growth in user bit rates (a 10 to 100 times increase) and overall system throughput (about a 1000 times increase) [2], which means a substantial spectral efficiency increase. At the same time, the power efficiency should be maintained or even improved, not only to have greener communications, but also to cope with the billions of sensors that will populate The associate editor coordinating the review of this manuscript and approving it for publication was Shuai Han. every place, that will require long battery lifetimes [1], [3]. To accomplish these requirements, one needs to employ new transmission techniques, with the most promising ones being based on the massive Multiple-Input Multiple-Output (m-MIMO) concept, together with the transmission at millimeter wave (mmWave) frequencies [3], [4].
The adoption of mmWave transmission is interesting not only due to the vast bandwidths available, but also because of their small wavelength. In fact, with the wavelengths contained in the range of 1 to 10 millimeters, the antennas become smaller, allowing small-sized transmitters and receivers with a very high number of antenna elements and, therefore, enabling m-MIMO implementations. In its turn, m-MIMO can be employed to explore spatial multiplexing and beamforming gains, enabling the service of multiple users with high bit-rates while reducing interference and/or increasing coverage [5]. However, mmWave VOLUME 7, 2019 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ frequencies present considerable challenges regarding propagation (high free-space losses, small diffraction effects and almost total absorption losses due to obstacles) and implementation difficulties, both at the analogue and digital domains (e.g., Digital-to-Analogue Converter (DAC) and Analogue-to-Digital Converter (ADC) design, efficient amplification, signal processing requirements for equalization and user separation, etc.), which can be particularly challenging for m-MIMO systems [6]. Besides that, power and spectral efficiencies could be conflicting, and different techniques must be employed to achieve each one of them, which makes a significant challenge to combine them with success. One way to increase the spectral efficiency is by employing dense and large constellations, such as 64-Quadrature Amplitude Modulation (QAM) or 256-QAM. However, not only larger constellations have higher power requirements, but also the corresponding signals have larger envelope fluctuations, requiring the use of amplifiers with higher backoff, which further reduces the power amplifier efficiency. By employing single carrier schemes, such as Single Carrier with Frequency Domain Equalization (SC-FDE) [7], [8], instead of the commonly used Orthogonal Frequency Division Multiplexing (OFDM) schemes, we can reduce the amplifier's backoff, improving amplification efficiency. This is mainly because SC-FDE signals have lower envelope fluctuations than OFDM schemes based on similar constellations. Nonetheless, SC-FDE signals still presents substantial envelope fluctuations and a relatively high Peakto-Average Power Ratio (PAPR), especially for large constellations and/or when the signals are filtered to have compact spectrum. This means that a quasi-linear amplifier is required (e.g., a class A or B amplifier), which are more difficult to implement and have much lower amplification efficiency than strongly nonlinear amplifiers (such as class D amplifiers). It is known that a general QAM constellation can be decomposed as the sum of appropriate Binary Phase Shift Keying (BPSK), Quadrature Phase Shift Keying (QPSK) [9], [10], whose signals present a reduced dynamic range and can be separately amplified with reduced distortion by different amplifiers [11], allowing a more efficient amplification while maintaining the same spectral efficiency.
As an alternative, we can employ offset modulations. In this case, Offset QAM (OQAM) signals are decomposed as the sum of OQSPK (Offset QPSK) components, which presents a more interesting case in terms of power amplification, since OQPSK signals do not present zero crossings, reducing not only the envelope fluctuations, but also its dynamic range. This means that they can be compatible with highly efficient, strongly nonlinear amplifiers. For this reason, they were proposed for multilayer m-MIMO system at mmWave bands presented in [12], [13]. In this type of system, two or more layers of antenna are implemented at the transmitter side, the first for the transmission of each OQPSK component (actually, this multilayer concept is suitable for non-offset and offset constellations, although it is particularly interesting for the later case), which are combined at the channel to form the intended OQAM signals, and the remaining for beamforming and/or multiuser multiplexing.
Since we are considering SC-FDE schemes, any FDE can be employed at the receiver side, although better performances can be achieved if a linear FDE is replaced by the powerful Iterative Block Decision Feedback Equalizer (IB-DFE) receivers [7]. However, conventional IB-DFEs were designed for non-offset constellations, and their performance is rather poor with offset constellations mainly due to the residual In-phase-Quadrature Interference (IQI). To overcome this, the IB-DFE concept was modified for offset constellations [14], [15]. Although IB-DFE receivers were successfully extended to MIMO scenarios [8], as far as the authors know, the work on FDE receivers for offset signals in MIMO scenarios is limited.
Since offset signals are usually intended for strongly nonlinear amplifiers, they are designed to have very low envelope fluctuations. In general, this means employing a pulse shape whose band is above the minimum Nyquist band, unless sophisticated techniques are employed to reduce the envelope's dynamic ranges such as magnitude filtering [16], [17].
In this paper, we consider offset signals with reduced envelope fluctuations combined with MIMO schemes that are suitable to combine with strongly nonlinear power amplifiers, and we design appropriate FDE receivers. Conventional IB-DFE receivers are changed to cope with offset signals in MIMO scenarios, while the pragmatic receivers presented in [14], [15] are also extended to these scenarios, leading to improvements of Bit Error Rate (BER) performance in comparison with conventional IB-DFE while reducing complexity. Notwithstanding their excellent performance, approaching the Matched Filter Bound (MFB) with only few iterations, they become too complex for m-MIMO schemes due to the required inversion of very large channel matrices, for each subcarrier and each iteration. Therefore, receivers that do not require matrix inversions must be used to reduce system complexity, while still able to achieve good performance. Iterative receivers based on Maximum Ratio Combining (MRC) and Equal Gain Combining (EGC) [18]- [21] concepts are interesting because they do not require matrix inversions. Although the residual interference levels (both interference between different transmitted streams and Inter-Symbol Interference (ISI)) can be high with such low complexity receivers, they can achieve very good performance when N R N T , which can be ensured in m-MIMO systems. Therefore, these receivers are also studied in this paper and BER performance and complexity analyses of the mentioned receivers are performed to show the benefits of using low complexity receivers.
This paper is organized as follows: the used system and its characterization is presented in section II. Following, IB-DFE, pragmatic and low-complexity receivers are presented and evaluated in sections III, IV and V, respectively. In section VI, a complexity analysis of the studied receivers is performed, followed by conclusions in section VII. A list of symbols has been added also to the paper (which follows the conclusions section) to be used as reference for an easier reading of the paper.

II. SYSTEM CHARACTERISATION
In this work, we consider an uplink scenario employing a multilayer scheme like the one presented in [12], [13]. This scheme is presented in Fig. 1 and it uses large and dense constellations allowing high spectral efficiency with reduced power requirements for a given constellation size [22] and block-based SC-FDE schemes to cope with severely time-dispersive channels [7], [10], with receivers designed taking into account signals' and mmWave channel characteristics [2], [15].
This multilayer approach is a promising technique for mmWave bands, since the small wavelength allows a large number of antenna elements in a small space. This large number of antennas allows the use of up to three antenna layers at the transmitter [12], [13], where: • 1st-layer is designed to efficiently amplify the different OQPSK type components in which a given multilevel constellation can be decomposed [9], [10], by employing multiple nonlinear amplifiers and antennas, and with signals' combination performed at the wireless channel [11]; • 2nd-layer is used for beamforming purposes, to separate users, multipath components and/or increasing coverage; • 3rd-layer is employed for spatial multiplexing, to allow multiuser support without directional constraints. Thus, the transmission employs block-based SC-FDE, with transmitted blocks having size of N b symbols, between N T transmission antennas and a base station employing N R reception antennas. Between the two extremes of the communication, it is used the mmWave channel model proposed in [21] and here illustrated in Fig. 2. In this paper, we will focus on 60 GHz, without loss of generality because similar results are obtained for other frequencies at mmWave bands. This channel is based on a clustered model and it assumes that an antenna transmits to a base station a signal with unitary power, that is split into N ray rays that can be grouped in clusters of equal number of elements with similar delays and similar Angles of Arrival (AoA). Then, N ray = N ray_clu × N ch_clu , where N ray_clu is the number of rays in each cluster and N ch_clu is the number of clusters. Then, these rays are received at base station that uses two layers of antennas, one for beamforming with R b antennas and the other for spatial multiplexing purposes composed by R u antennas with correlation factor ρ u between each adjacent pair. The alphabet G can be seen as the Cartesian product of two subsets G I and jG Q , that disregarding the imaginary number are equal and for a square constellation 2 are composed by Thus, a symbol of a generic M -QAM, or an M -OQAM, constellation can be represented as where s where s n p denotes the n p -th symbol of the constellation, with n p = 0, . . . , Returning to the general case, since there are √ M constellation symbols as well as √ M coefficients g i , based on 2 For matter of simplicity, and as it is the common case, consider that constellations are square, i.e. log 2 (M ) is even, and that the bit-mapping along in-phase and quadrature axis is the same.  M , and the coefficients g i can be obtained from the inverse Hadamard transform of the vector of constellation points. In practice, g 0 = 0, since it is the centre of mass of the constellation; moreover, several other g i can also be 0 [10] depending on the chose mapping between the µ p -bit tuples and the symbols of the constellation. Denoting N p as the number of nonzero g i coefficients, then it is clear that a given constellation can be decomposed as the sum of N p ≤ √ M polar components [23]. When considering an uniform √ M -PAM constellation, (that is the case that will be considered from now on) the only non-zero coefficients are g 1 , g 2 , g 4 , · · · , g √ M /2 (i.e., the coefficients g 2 m , m = 0, 1, · · · , µ p − 1). Moreover, for a natural binary mapping, g 2 m = 2 m , with (3) becoming While to obtain a Gray mapping 3 (3) becomes 2) DISCRETE TIME POLAR REPRESENTATION OF M-QAM AND M-OQAM As M -QAM and M -OQAM constellations can be represented as the combination of two PAM constellations (since symbols are uniformly spaced along both in-phase and quadrature axis), their discrete time representation as sum of polar components results straightforward from combining (2) and (3). When the constellation is rectangular and the bit-mapping along in-phase and quadrature axis is the same, this results Each of the N p polar components can thus be modulated as a BPSK signal [24], enabling as so efficient implementations in massive-MIMO context. Notwithstanding, there are particular advantages in considering OQAM signals, as it will became clear next.

3) CONTINUOUS TIME POLAR REPRESENTATION OF M-QAM AND M-OQAM
Although M -ary QAM and OQAM share a common polar decomposition in the discrete time domain, their continuous time counterpart signals differ considerably due to the half of symbol's period (i.e. T s /2) time shift between the in-phase and quadrature components. Thus, the complex equivalent baseband signal is for the QAM case given by while for the OQAM is where τ is the temporal index and p(τ ) is the Nyquist supporting pulse for bandwidth limited transmission, with the passband signal being given for both cases by with f c denoting the carrier frequency. Equation (10) shows that for the case of M -ary OQAM signal this can be seen as a serial representation of an OQPSK signals [24], that can be specially designed to have constant envelope or acceptable trade-offs between reduced envelope fluctuations and compact spectrum upon proper choice of pulse shaping p(τ ) (e.g., a Gaussian Minimum Shift Keying (GMSK) signal [25]). The N p OQPSK components can thus be separately amplified and transmitted by N p antennas, with their combination to form the correspondent OQAM signal being performed on the air upon MIMO transmission [11], [13]. In addition, due to the controlled envelope nature of OQPSK signals, highly efficient, low-cost, strongly nonlinear amplifiers can be employed in this case, making clear the advantages of using OQAM signals. This can be particularly interesting at mmWave where large aggregate antennas can be employed and signal's spectrum occupancy above the minimum Nyquist band is not a constraint.

B. MULTIRATE PROCESSING OF OQAM SIGNALS
Due to the time-shift between in-phase and quadrature components of offset signals, the digital processing of offset signals upon reception requires the use of sampling above the minimum Nyquist rate. Consider the scenario presented in section II and letx (t) (τ ) denote the baseband complex equivalent M -OQAM signal that is transmitted by the t-th antenna (t = 1, . . . , N T ), and the corresponding sequence of transmitted M -OQAM symbols being s (t) n as given by (8). Letx (t) n denote the sequence resulting from samplingx (t) (τ ) at a rate L/T s , with L being the oversampling factor above the Nyquist rate which is restricted to be even. According to (10) it is straightforward to prove that where ' * ' denotes the discrete time convolution operation, p n is the sampled version of the pulse shaping filter at the rate L/T s , i.e.
For an SC-FDE transmission employing blocks of N b symbols the discrete Fourier Transform (DFT) of sequences s (t,j) n and s (t,j) n can be related 4 Given the periodic nature of the DFT, these relate as does meaning that the spectrum S An equivalent representation of symbol s (t) n at oversampling rate L/T s can thus be obtained based on (12), being given bys It is important to note thats (t) n embeds physical nature of OQAM signals by having in-phase and quadrature components shifted by L/2 samples.
According to (15) and time-shifting property of the DFT, it results that the DFT ofs (t) n is given by with q ∈ Z, meaning an alternation in the signal of the quadrature component of each replica. Also, from (12) it resultsX Analysing (17) and (19) important conclusions can be drawn upon digital processing of OQAM signals. Although, according to (15), and considering (17)  k (which corresponds to process the signal at Nyquist rate), there is a sort of diversity effect that is created by processing the OQAM signal at an highest rate, where this information is repeated 5 every N b samples ofX (t) k . This can be very useful to improve the BER performance for linear equalizers when non-offset constellations are used, particularly for the case of low-envelope fluctuation offset signals. In this case, the pulse shape p(τ ) has typically a bandwidth considerable above the minimum Nyquist band. So, P k samples for k ≥N b can have non-negligible values, and consequently the corresponding X k as given by (17), and letting S k , we can thus make an average over the L replicas, instead of considering only its first N b values. It can be shown that 5 In fact, this are not true replicas since they are affected by the phase shift factor k given by (18) which is known, and can therefore be compensated.
where a new notationS (k,l) =S k+lN b have been adopted to refer the samples ofS k related to S k . 6 Proof: Proof will be made by computing each of the average terms of the summation (22). By considering (17) the calculus of the left average term comes where last equality of (23) results from the fact that for consecutive values of l phase shifts are symmetric, i.e. (k,l) = − (k,l+1) , and the oversampling factor L has been restricted to be an even number.
Consider now the right average term of (22). Similarly, by using (17) and (18) it results, And so, it results from substituting (23) and (24) in (22) that as it was wanted to be proved. In order to simplify analysis that follows on the equalization of OQAM signal under multirate signal processing, the average defined in (22) will be hereafter denoted as or in an equivalent manner, using (20), being denoted as

1) LINEAR FREQUENCY DOMAIN EQUALISATION OF OQAM SIGNALS UPON MULTIRATE PROCESSING
Let's start by expressing (19) in an equivalent matrix as For an m-MIMO system using SC-FDE with offset constellations, the received signals under multirate digital processing are given by where k denoting the signal received by the r-th antenna, ] T is the additive white Gaussian noise (AWGN) component, and H eq k = P k H k is the equivalent MIMO channel frequency response to be equalised, which includes the channel frequency response and the pulse shaping filter, and where with H (r,t) k denoting the channel frequency response between the antenna pair (r, t).
At the receiver, one obtains an estimation of the oversampled transmitted symbol using the linear equaliser where F k denotes the matrix of feedforward coefficients and the estimation of the block of OQAM transmitted symbols S k (i.e. at symbol rate) is obtained through averaging of all replicas for a given frequency k as defined in (26), i.e.
Looking at (32), one can see that there is an average summation that does not exist for non-offset cases.
Therefore, the already existent equalizers must be changed in accordance.

III. IB-DFE RECEIVER FOR OFFSET SIGNALS
IB-DFE algorithms for multiuser/spatial multiplexing upon MIMO SC-FDE transmissions using non-offset constellations have been proposed and discussed in [7], [8], [26]. The main principle consists into the detection of each stream at a time while cancelling the interference from the already detected streams. The streams are ranked according to a quality measure (e.g., the average received power) and detected from the best to worst ensuring that the stronger ones are not interfering when the weaker ones are being detected. This detection is done by performing iterative frequency domain equalization with both feedback and feedforward filters, as shown in Fig. 3. Moreover, IB-DFE is an iterative method in order to have better results because works on a per-block basis, meaning that the feedback's effectiveness to cancel all the interference is limited by the reliability of the detected data at previous iterations.
Although IB-DFE presents a high complexity, it approaches the Matched Filter Bound (MFB) even in scenarios with high correlation between reception antennas. Nevertheless, it is still not able to cope with signals based in offset constellations. Thus, in this section, the IB-DFE equaliser will be derived for offset signals. Here, it is considered that the equaliser tries to reverse the pulse shaping filter and the channel frequency response simultaneously.

A. IB-DFE WITH HARD DECISIONS
The frequency domain estimations associated with the i-th iteration at the output of the equaliser are given by where the oversampled feedforward and the feedback matrices are respectively. F According to the Bussgang theorem [27], the hard-estimationsŜ (i−1) k , can be written as the sum of two uncorrelated components: one related to S k and a distortion term. Hence,Ŝ (i−1) k could be expressed aŝ where (t,i−1) k represents the zero-mean quantisation error for the t-th transmitter at iteration (i − 1), and where ρ (t,i−1) is the correlation factor of the t-th transmitter at iteration (i − 1), which is expressed by The correlation factors supply a block-wise reliability measure of the estimates employed in the feedback loop, that is used to control the receiver's performance. This control is done taking into account the hard decisions for each block plus the overall block reliability, which reduces error propagation effects. Therefore, for the first iteration, the correlation factors are zero, i.e., is written as a diagonal matrix because it is assumed that the signals of the multiple transmitters are independent. This independence allied to E with assuming that the transmitters are emitting the same power σ 2 s n . The feedback and the feedforward coefficients are chosen to minimise the Mean Square Error (MSE). For an m-MIMO system, the MSE of the t-th transmitter at iteration i and frequency k is given by (42) Thus, to minimise the MSEs of all transmitters simultaneously, their sum should be minimised, i.e., min min min in order to ensure the correct recovery of the transmitted signals.
Using the method of Lagrange multipliers [28], it is possible to solve the problem defined in (43) and (44) as follows. We define the Lagrange function as where λ (i) k corresponds to the Lagrange multiplier at iteration i and frequency k, and the coefficients F (k,l) and B k that minimise the MSE could be obtained by solving the system of equations given by After solving (46), it is shown that the feedforward coefficients for iteration i are given by where κ is a normalisation matrix 7 and (k,l) is given by 7 Usually, κ is a diagonal matrix with size N T × N T , with the values of position (t, t) given by the inverse of the Left Hand Side (LHS) of (44).
In its turn, the feedback coefficients for iteration i are given by Throughout the IB-DFE iterations, in general, the correlation coefficients in become higher while the deviations become lower, and the estimations are improved, enhancing the system BER performance. It must also be noted that at first IB-DFE iteration, (48) simplifies, corresponding to a linear Minimum Mean Square Error (MMSE)-based equaliser.

B. IB-DFE WITH SOFT DECISIONS
A way to improve the IB-DFE receiver is to use soft instead of hard decisions [10]. This improvement is achieved with the use of a different correlation factor for each symbol component instead of one factor that remains constant throughout the block. In this case, (33) becomes where S (i−1) k denotes the average symbol values conditioned to the output of the equaliser at iteration i−1.
To obtain the values of S with where β Here, it will not be specified if these are in-phase or quadrature bits because the analysis is equal for both components. However, the reader should be aware that the formulas (51) to (56) refer only one component (i.e., the BPSK case), being applied to both the in-phase and quadrature components with their results being combined in (59). containing a symbol s with β (t,m,i−1) n = 0 or 1, respectively, and k . Assuming uncorrelated bits and using (3), each of the components of the soft decision at iteration i − 1 could be written as Then, the reliability of one component of the estimates to be used in the feedback loop at iteration i − 1 is expressed by where ρ (t,m,i−1) n is the reliability of the m-th estimated bit of the n-th transmitted symbol by the t-th transmitter at iteration i − 1, and it is given by When using soft decisions, the reliability is already included in S (i−1) k . Therefore, in this case, (49) does not need to include it in its calculation and becomes On the other hand, the feedforward coefficients are still obtained by (47), but (48) becomes (k,l) = where (i−1) denotes a diagonal matrix with each element given by n obtained using (55) applied to the in-phase and quadrature components, respectively. 9

C. BER PERFORMANCE WITH IB-DFE FOR OFFSET SIGNALS
In this subsection, the BER performance with IB-DFE for offset signals is evaluated. We considered a system with N T = 16 transmitters each one with one antenna, N R = R b ×R u = 4×16 = 64 reception antennas with the R u groups uncorrelated (i.e., ρ u = 0), and a clustered mmWave channel [21] with 4 clusters, each one with 3 rays each. The block size is N b = 256 and the constellation 4-OQAM. Since we want to employ grossly nonlinear amplifiers, requiring constant or almost constant envelope signals, a sine arcade has been used as pulse shaping producing a Minimum Shift Keying (MSK) signal, i.e, This is not a problem at mmWave where there is huge bandwidth, allowing the relief of the bandwidth constraint to obtain signals with constant envelope that could be amplified with nonlinear amplifiers. Fig. 4 present IB-DFE results at 1st and 4th iterations. It is possible to see that after 4 iterations the results are closer to the MFB, but for the linear FDE (i.e., when only one iteration is used), the performance is poor due to the high IQI levels. In that sense, equalizers with better performance in the first iteration should be developed, ensuring that the receiver converges in few iterations and reducing its complexity. It can also be observed that the diversity effect created by oversampling, that enhances the results for the first iteration, almost vanish after a few iterations.

IV. PRAGMATIC RECEIVER FOR OFFSET SIGNALS
Since the problem of equalizing offset signals resides on the IQI, ensuring a perfect match in the pulse shaping may be fundamental to minimise it. In conventional IB-DFE receiver design, pulse shaping effects are assumed to be together with the channel response, and they are estimated and recovered at the same time. As the pulse shaping is chosen to ensure the first Nyquist criterion, it is known a priori. Therefore, in [15], it was suggested a pragmatic approach for SISO where pulse shaping is assumed to be perfectly matched and the receiver only tries to equalize the channel response. Here, a receiver based on that concept is derived for MIMO and its BER performance is also evaluated.
The pragmatic receiver is also iterative and could also use hard or soft decisions. 10 However, instead of equalizing and recovering the signal at the same time, i.e., taking into account the contribution of the multiple replicas of the signal created by the diversity effect introduced by oversampling, 9 Here, the superscripts (t, i − 1) have been omitted to lighten the notation. 10 Without loss of generality, from now on, only soft decisions will be considered. The analysis for soft decisions for the remaining equalizers is similar to the IB-DFE case and it follows the lines of subsection III-B. the pragmatic approach equalizes the oversampled signals. Hence, the output of the equaliser is given by where the feedforward filter F (i) (k,l) , at each iteration i, is pragmatically considered as the product of the pulse shaping matched filter P * (k,l) and the filter E (i) (k,l) that tries to equalize the channel. This results in Only after the equalization process, the signal is down sampled using (32), i.e. the estimated signal is given by Using an MMSE criterion like in the previous section, it can be shown that the feedforward coefficients excluding 94382 VOLUME 7, 2019 the pulse shaping matched filter are given by with (k,l) = and κ being a diagonal normalisation matrix ensuring The feedback coefficients for iteration i are given by

A. BER PERFORMANCE WITH PRAGMATIC RECEIVER
In this subsection, the same system tested in section III-C is used. By observing Fig. 5, we can see that the first iteration of the pragmatic receiver presents a very good performance, with results closer to the MFB, contrarily to the IB-DFE first iteration that cannot converge. Clearly, the first iteration presents a good estimation and the second iteration the pragmatic receiver continues to be better than a conventional IB-DFE. However, in the fourth iteration the performance becomes similar for both. Although the results are similar when using iterations and it is almost indifferent which receiver its used, when trying to reduce the complexity, using less iterations, the pragmatic is the best choice, presenting a good performance even in its linear form. Nevertheless, it still requires matrix inversions, requiring high complexity.

V. LOW COMPLEXITY RECEIVERS FOR OFFSET SIGNALS
As matrix inversions could be a problem in m-MIMO schemes, receivers based on Maximum Ratio Combining (MRC) and Equal Gain Combining (EGC) concepts [18], [20] present lower complexity than conventional IB-DFE receivers or pragmatic approaches.

A. MOTIVATION
The MRC and EGC techniques are appropriate when N R /N T 1 (which is a reasonable approach for the uplink of m-MIMO systems) and the channels between different transmit and receive antennas have a small-to-medium correlation. In fact, for the next generation systems, these conditions can be verified, and MRC and EGC based receivers could be a low complexity solution for equalization, presenting very good performance.
These low complexity approaches take advantage of the fact that the cross-correlation between the columns of the channel matrix is relatively low, which means that the corresponding Gramian matrix H (k,l) H H (k,l) is almost diagonal for MRC, as well as, the matrix e j arg(H (k,l)) H (k,l) for EGC. Fig. 6 shows a color map of the absolute value of the Gramian matrix H (k,l) H H (k,l) and the matrix e j arg(H (k,l)) H (k,l) for different correlation values and two different systems with N T =16 and In the first column of Fig. 6, it is shown that the most significant values are always in the main diagonal, with values outside the main diagonal increasing a little when the correlation becomes high, showing that MRC principle is valid when correlation is low. However, in this case, we are considering the same number of transmitters antennas and low correlated groups R u and a ratio N R /N T = 64/16 = 4.
When the ratio is decreased, as for a system with N T = 16 and N R = R b × R u = 4 × 8 = 32, even for low correlation values, the difference between the main diagonal and the remaining values becomes reduced and for high correlation values, it is almost indistinguishable, as can be seen in the middle column of Fig. 6. Therefore, to have better results with MRC, we should have at least the same number of low correlated antennas at the reception than at the transmission and to cope with scenarios with high correlation between reception antennas, their number should increase to fight this drawback.
At last, in the last column of Fig. 6, we see the matrix of e j arg(H (k,l)) H (k,l) , showing that the same conclusions taken for the MRC approach are also valid to EGC approach.

B. ITERATIVE MRC AND EGC EQUALISERS
It should be noted that, although the of-diagonal elements of the Gramian matrix converge to zero as we increase the number of receive antennas, the total power of them can still be similar to the power of the elements at the main diagonal when N T is similar to N R . For this reason, MRC or EGC receivers are only appropriate for the case when N T N R . Since next generation communication systems can be designed to have high N R /N T ratios and many antennas that can be placed with distances of multiple wavelengths in a small space, especially for systems operating at mmWave frequencies, there will be conditions to use low complexity MRC and EGC based receivers. To adapt these receivers to offset signals, the pragmatic approach can be employed,  equalizing only the oversampled signal, instead of equalizing and recover at the same time, and assuming pulse shaping perfect matching. Therefore, the output of the equaliser is given by followed by averaging according to (32), i.e., with result at the end of the i-th iteration being computed as in (64). The main differences to the previous approach are that instead of obtaining the feedforward coefficients through high complexity equations like (65) and (66), with the inversion of huge matrices for each frequency, MRC or EGC schemes use feedforward coefficients that are simpler to determine like the Hermitian of the channel and the phase of the channel elements, and that do not depend on the iteration. 11 Hence, the feedfoward coefficients for both equalizers are given by with E (k,l) varying accordingly the chosen method. 11 It should be noted that, as in the MRC and EGC receiver of [20], the iterations are still required to cancel the residual inter-user interference levels, but feedforward and feedback filters are kept unchanged along the iterations.   For the MRC equaliser, we have where κ denotes a normalisation diagonal matrix whose the (t, t)-th element is given by For the EGC receiver, we have with the elements of A k given by and κ denoting a normalisation diagonal matrix whose the (t, t)-th element is given by Hereupon, it can easily be shown that the optimum values of B k are given by The remaining process is equal to the one presented for the conventional IB-DFE or the pragmatic receivers. Therefore, iterative receivers based on MRC and EGC concepts are very similar to IB-DFE and pragmatic receivers but with the advantage of having fixed F (k,l) and B (k,l) matrices for the different iterations and not requiring complex matrix inversions, while obtaining almost the same BER performance for scenarios with N T N R and low correlation between antennas, as it will be shown in the next subsection.

C. BER PERFORMANCE ANALYSIS WITH LOW COMPLEXITY RECEIVERS
In this subsection, the system presented in section II is used in a BER performance comparison for the receivers VOLUME 7, 2019 previously presented. We considered a system with N T = 16 transmitters each one with one antenna and multiple configurations of the reception antennas. The mmWave channel already described was considered with N ch_clu = 4 clusters of N ray_clu = 3 rays each. The block size is N b = 256 and different constellation sizes are tested. As previously, the pulse shaping is a sine arcade.
We start to test the scenario with N R = R b × R u = 4 × 16 = 64 reception antennas with the R u groups uncorrelated, i.e., ρ u = 0. These results are presented in Fig. 7. We can see that when using 4 iterations the low complexity receivers, especially the MRC, present a very good performance, close or even better than the IB-DFE and pragmatic approaches for 4-OQAM and 16-OQAM. This fact, allied to their low complexity, makes them a suitable choice for m-MIMO systems like the one herein described. However, for greater constellations such as 64-OQAM, their performance becomes poor. Since it was considered a scenario where the R u were uncorrelated, the only way to improve the BER performance in 64-OQAM is to increase the number of reception antennas. Therefore, we studied the BER performance when varying the correlation factor ρ u for a given E b /N 0 and for different number of reception antennas. The E b /N 0 values chosen correspond to the MFB at 10 −4 and its value is presented in table 2.
In Fig. 8, the BER results for the scenario previously presented with N R = R b × R u = 4 × 16 = 64 reception antennas are depicted only for ρ u ≥ 0.4 because below this value there are no gains. It is shown that for 4-OQAM the BER is constant until the correlation reaches about 0.8 for all receivers. This limit is similar when considering 16-OQAM and 64-OQAM using IB-DFE or pragmatic receivers. However, as expected MRC and EGC are more sensible to the correlation factor, with MRC starting to be affected at ρ u = 0.5 for 16-OQAM and both of them not being below 10 −2 for 64-OQAM.
On the other, when the number of antennas is increased to N R = R b × R u = 4 × 32 = 128 reception antennas, the performance of MRC and EGC improves substantially, even for 64-OQAM constellations, being only affected when ρ u ≥ 0.8 for the more complex methods as seen in Fig. 9. Therefore, once more it is shown that for an m-MIMO scenario at mmWaves, MRC is a low complexity alternative to other methods presenting the same or even better performance.
We have also performed simulations for an even harder case, where it was considered a system with N R = R b × R u = 8 × 32 = 256 reception antennas and 64-OQAM. The results obtained were similar to the ones presented in Fig. 9(c), reinforcing the conclusions previously drawn, and for that reason they are not here presented. Simulation tests to study the impact of the diversity effect created by oversampling as also been conducted. It was also seen that, as long as L≥2, the diversity effect created by oversampling does not affect the BER performance when iterations are used and once more the results are not presented.
The hereby BER results show that low complexity receivers present performance very close to the MFB, with MRC being the best receiver tested, but they are more sensitive to correlation between antennas. However, when under favorable conditions, i.e., for m-MIMO scenarios with hundreds of antennas, they present the same behavior of IB-DFE or pragmatic approaches, only being affected for ρ u ≥ 0.8 values. Hence, considering their low complexity, they are a suitable choice to use in the next generation communication systems.

VI. COMPLEXITY ANALYSIS
In this section, a complexity analysis for the different equalizers presented in this work is performed. The equalizers here analyzed are: conventional IB-DFE, pragmatic and iterative MRC and EGC. The analysis for conventional IB-DFE and pragmatic approaches is only done for its linear part, that in the case of IB-DFE corresponds to an MMSE equaliser. This decision resides in the fact that the iterative MRC and EGC with just 4 iterations present better results and much lower complexity (as it will be shown) than conventional IB-DFE and pragmatic linear approaches, while performing very close to MFB. Therefore, it is not worthy to perform iterative conventional IB-DFE and pragmatic since results will be similar to the iterative MRC and EGC but with much higher complexity.
This analysis is performed per frequency k (i.e., at symbol rate 1/T s ) to ensure a fair comparison when oversampling is used, since the various methods deal with it in different ways.
Also, the number of FLOPs is used as comparison method and only the calculus directly related to the MIMO equalization procedure are included. We consider, as in [29], that the operations +,−,×,/, and square root in the real domain require one FLOP. The number of required FLOPs for the remaining matrix and scalar operations used in this analysis are in Table 3. It is considered that c is a real scalar, w and z are complex numbers, A, B and C are arbitrary matrices of complex coefficients with dimensions N ×P, P×T and P×P, respectively, D is a diagonal matrix of complex coefficients with dimensions P×P, I is the P×P identity matrix and v is an arbitrary vector of complex coefficients with size P × 1. A H represents the Hermitian of the matrix A, whose calculus is considered that does not require any FLOP and L is the oversampling factor.
The demonstration for the values present in Table 3 is straightforward, with the exception of the first two properties. The number of FLOPs to obtain the product of a matrix by its Hermitian, i.e. the Gramian matrix A H A, follows from [30]. For the inversion of a matrix, it is considered the Gauss algorithm, with complexity presented in [31]. Table 4 presents the number of FLOPs of each algorithm stage, considering the equations presented in the previous sections. As this analysis is made per frequency k and κ is equal for every frequency k, it only needs to be computed once and the calculus of its complexity is divided by N b .
Note that for the iterative MRC and EGC equalizers, when estimatingS k , the first iteration corresponds to the linear equaliser (31), and in the following iterations only the product B (k,l)S (i−1) (k,l) and its subtraction from the result of (31) has to be computed, since the matrices F (k,l) and B (k,l) are fixed; thus each additional iteration adds just a small computational burden. Table 4 shows that there is an improvement in reducing the computational complexity of the overall system when employing the iterative MRC and EGC in comparison with the first iteration of conventional IB-DFE and pragmatic receivers. Moreover, we see that conventional IB-DFE is more dependent on the oversampling factor, while the remaining methods only depend on it in the calculus of κ and the when employing ϒ function to recover the original symbols.
An important result that can be taken from Table 4 is the asymptotic complexity reduction when N R /N T 1. From the table analysis, it can be concluded that MMSE and pragmatic present an asymptotic complexity of ( 8 3 + (4L + 8)(N R /N T ))N 3 T and ( 8 3 + 12(N R /N T ))N 3 T , respectively, while for the iterative MRC and EGC is (8(N R /N T ))N 3 T . By performing the ratios of the asymptotic complexities, one gets and which means that the complexity reduction converges asymptotically to 33% comparing with the pragmatic approach and, at least, 50% (considering L = 2) when comparing with MMSE. Note that for moderate values of N R and N T , this may not seem a substantial reduction regarding the number of computed FLOPs. However, for m-MIMO scenarios, where it is necessary to deal with high dimension matrices, this reduction is noticeable, and it may correspond to the savings of hundreds of thousands of FLOPs as shown in Table 5.

VII. CONCLUSIONS
In this paper, we considered the use of offset constellations in m-MIMO systems operating at mmWave frequencies. The transmitted signals were designed to be compatible with strongly nonlinear power amplifiers, since they either have an almost constant envelope (as in the OQPSK case) or can be decomposed as the sum of constant-envelope OQPSK components, making them compatible with strongly nonlinear power amplifiers.
To equalize this type of signals, we proposed low complexity frequency-domain receivers. In m-MIMO scenarios, it is shown that the proposed receivers can have performance close to the MFB, while achieving a complexity at least 33% lower than conventional methods that employ matrix inversions, making them particularly interesting for future wireless systems operating at mmWave bands. He is also a Researcher with the Instituto de Telecomunicações (IT) and performs different collaborations under his research and development activities with other institutions and universities worldwide. His main research interests include wireless digital communications, general signal processing for communications and ultrasound systems, error control coding and physical layer security, electronics and SDRs, FPGAs, and DSPs. He is also a member of the IEEE Communications Society and the IEEE Vehicular Technology Society.

LIST OF SYMBOLS
VITOR SILVA received the Graduate Diploma and Ph.D. degrees in electrical engineering from the University of Coimbra, Portugal, in 1984 and 1996, respectively, where he is currently an Assistant Professor with the Department of Electrical and Computer Engineering. He is also a Lecturer of signal processing and information and coding theory with the Department of Electrical and Computer Engineering, University of Coimbra. He is also the Coordinator of the Instituto de Telecomunicações (IT), Coimbra, and a member of the Board of IT Directors. He has published over 150 papers in top journal and conferences. His research interests include signal processing for communications, video coding, error correcting codes, and parallel computing.