Performance Analysis of MIMO-NOMA Iterative Receivers for Massive Connectivity

The Fifth Generation (5G) of wireless networks introduced support to Machine-Type Communications (MTC), which is the wireless connectivity solution for Internet of Things (IoT) applications. MTC is split into two different categories: massive MTC (mMTC) and critical MTC (cMTC). Current 5G standards and technologies are not capable of fully satisfying the requirements of both mMTC and cMTC use cases, thus industry and academia have already started developing solutions for MTC in beyond-5G and 6G networks. In some mMTC use cases, receivers might not be equipped with a large number of antennas owing to cost, size or power limitations, thus the number of active devices in a time slot may surpass the number of antennas. Due to the limited spatial multiplexing capabilities, only multi-antenna techniques are not enough to provide connectivity to a massive number of devices in such scenarios. In this paper, we propose and evaluate the performance of iterative linear receivers that can address this issue. By combining Multiple-Input Multiple-Output (MIMO) techniques with Non-Orthogonal Multiple Access (NOMA) exploiting Successive Interference Cancellation (SIC) or Parallel Interference Cancellation (PIC) decoding, the proposed novel receivers are capable of performing dynamic ordering SIC/PIC decoding of multiple overlapping signals even when the number of active devices surpasses that of receive antennas. The performance of the receivers is studied in terms of outage probability and computational complexity. Simulation results show that, among all the receivers studied in this paper, the PIC-based Minimum Mean Square Error (MMSE) receiver presents the best performance while at the same time reducing the number of complex signal operations such as matrix inversions.


I. INTRODUCTION
Traditional wireless communications systems from the First Generation (1G) to the Fourth Generation (4G) Long-Term Evolution (LTE) were mostly designed and optimized to support Human-Type Communication (HTC), e.g. voice calls, text messages and mobile internet. Meanwhile, latest releases of 4G LTE and Fifth Generation (5G) New Radio (NR) have also included support for Machine-Type Communications The associate editor coordinating the review of this manuscript and approving it for publication was Adao Silva .
(MTC), which is a key enabler for the Internet of Things revolution [1].
5G NR supports MTC in two main categories [2], [3]: massive MTC (mMTC) and critical MTC (cMTC), also known as Ultra-Reliable Low-Latency Communications (URLLC). The former aims at providing wireless connectivity to a massive number of low-power and low-complexity devices, such as wireless sensor networks for smart cities, smart industries and smart agriculture. Meanwhile, the latter aims at enabling applications with very stringent requirements in terms of latency and reliability, as required by autonomous driving and critical industrial automation, for example.
Current 5G standards and technologies are not capable yet of fully satisfying the requirements of mMTC and cMTC use cases, thus industry and academia have already started research activities aiming at developing robust, scalable and efficient Sixth Generation (6G) wireless networks that can address the limitations of current systems. Moreover, it is also expected that the requirements of MTC networks will become even more stringent in the coming years. In mMTC use cases, it is expected that the density of connected devices might reach the order of hundreds per cubic meter [4].
Several communication techniques have been proposed as enablers for mMTC and cMTC in beyond-5G and 6G networks, including massive Multiple-Input Multiple-Output (MIMO), Non-Orthogonal Multiple Access (NOMA) and network densification. The first one consists in the use of a very large number of transmit/receive antennas at the Base Station (BS). Multiple antennas allow exploiting multiple channel observations for the transmission and/or reception, mitigating channel and noise impairments, thus huge gains on spectral efficiency can be achieved. Nevertheless, channel estimation in massive MIMO is critical and a main source of limitations [5]. Besides enhancing the spectral efficiency, which is required by applications with extremely high data rates, massive MIMO can also improve the performance of mMTC [6] and cMTC [7].
When massive connectivity is required, traditional Orthogonal Multiple Access (OMA) techniques are not suitable because the number of active devices may be much higher than the number of available orthogonal radio resources. For this reason, NOMA has been considered a main enabler for mMTC use cases [8], [9]. By using NOMA, multiple active devices can share the same time/frequency resource during simultaneous transmissions. The combination of MIMO and NOMA techniques, denoted as MIMO-NOMA, is an important trend for beyond-5G and 6G networks [10]. When NOMA is applied to a single orthogonal resource block, a spectrally efficient way to realize multiple access is by adopting power domain NOMA. However, due to the interference imposed by the non-orthogonality, NOMA requires an inter-user interference cancellation mechanism.
Interference cancellation for multi-user systems is traditionally split into two categories [11]: Successive Interference Cancellation (SIC) and Parallel Interference Cancellation (PIC). In the case of SIC decoding, only one user is decoded in a given iteration. The strongest received signal is detected and decoded first, then the next strongest signal, and so on. After each successful decoding attempt, the received signal for that user can be reconstructed and subtracted from the composite received signal. On the other hand, in the case of PIC decoding, the signal transmitted by multiple users can be detected and decoded in a given iteration, and this process can be repeated over multiple iterations. Nevertheless, the concept of interference cancellation relies on the premise that the received signal can be reliably estimated. Reconstructing the received signal requires accurate estimation of the transmitted symbols and also of the users' channels. Imperfect Channel State Information (CSI) is a problem for both SIC and PIC [11]. Other drawbacks of SIC and PIC decoding are that both require buffering of received signals, what increases the computation complexity at the receiver [12].
After the CSI acquisition phase, a data detection phase is performed at the BS. In traditional MIMO setups, a linear processing technique like Maximum Ratio Combining (MRC), Zero Forcing (ZF) or Minimum Mean Square Error (MMSE) is used to detect the data symbols transmitted by all the devices active in a given time slot [13]. On the other hand, non-linear techniques like NOMA with SIC decoding in the uplink achieves the sum capacity under perfect CSI [12]. Senel et. al. [12] showed that NOMA outperforms MIMO when the number of active users is comparable to the number of transmit/receive antennas. When the number of antennas is much higher than the number of active users, MIMO presents the best performance.
Another promising solution for massive connectivity, closely related to the concept of network densification, is data aggregation. Instead of having a huge number of MTC devices connected to a common BS, the devices could organize themselves locally, exploiting short range communication technologies and creating small area MTC networks. Under this approach, a smaller number of MTC devices located nearby would be connected to a common gateway that also acts as a data aggregator. The main task of the data aggregator is to decode the packets transmitted by the MTC devices located nearby, perform some processing tasks (e.g. data compression), and then forward information to a BS, thus reducing congestion and power consumption at the devices side [14], [15]. Owing to physical size, cost and power limitations, such gateways usually cannot be equipped with a large number of antennas, leading to the potential issue of having more active users than receive antennas. VOLUME 10, 2022 A. RELATED WORKS Several works studied the performance of MIMO systems for massive connectivity, but generally with the assumption that the number of active devices in a given time slot does not exceed the number of antennas at the BS, which might not happen in mMTC use cases where data aggregation is performed. For instance, Liu and Yu [16], [17] studied the performance of an uplink scenario where a massive number of devices is connected to a BS that is equipped with a greater number of antennas. In [16], the active device detection and channel estimation is addressed, while in [17] the authors studied the achievable rate in the uplink when adopting either MRC or MMSE receive beamforming. As a result, it is shown that, for massive connectivity applications, MMSE outperforms MRC. Nevertheless, [16] and [17] do not consider a power control scheme, but pointed out that such approach could ensure a fair rate distribution among active devices.
The performance of the dynamic-ordering SIC decoding based on instantaneous received signal power and channel gains for uplink NOMA was studied in [18]- [21]. In [18], the authors derived closed-form expression for the outage probability of a three-user uplink NOMA system, but assuming perfect CSI. The impact of imperfect CSI is considered in [20], where a closed-form expression and numerical results for a two-user uplink NOMA system is presented. The results of [18] and [20] show that NOMA systems with dynamic ordering SIC outperforms fixed ordering SIC. An uplink pairwise NOMA system considering imperfect CSI was studied in [22]. In [19], authors derived closed-form expression of the outage probability for an arbitrary number of users and considering the impact of imperfect CSI. However, they presented numerical results for only a three-user NOMA system. In [21], the authors studied the optimal SIC decoding ordering and power allocation strategies in both downlink and uplink of a NOMA system, but considering only two users. Moreover, in all the aforementioned works [18]- [21], the receivers are equipped with only a single antenna.
In [23], the authors compared the performance of ZF and MMSE receivers with SIC decoding in a cooperative communication system. In their system, two receivers are served by one transmitter and two relays, all equipped with only two antennas. Nevertheless, they proposed an optimal SIC decoding ordering based on the Signal-to-Interference-plus-Noise Ratio (SINR). Their results showed that the MMSE-SIC scheme outperforms ZF-SIC, and also that SIC with optimal ordering shows a performance improvement when compared to SIC without ordering.
Popovski et. al. [24] proposed a communication theoretic framework for the coexistence between different 5G services in the same Radio Access Network (RAN). In the case of coexistence between enhanced Mobile Broadband (eMBB) and mMTC, they also studied the use of NOMA with SIC decoding but in a simple model that considers perfect CSI and a single-antenna BS. Their framework was extended for the case of a BS equipped with multiple antennas in our previous work [25], but still considering perfect CSI. In [25], we also dealt with the case where the number of users may be higher than the number of antennas at the BS, but only considering MRC reception with SIC decoding. Considering an heterogeneous network when one eMBB device shares the same radio resource with several MTC devices, we showed that the use of MIMO-NOMA techniques with MRC-SIC decoding is a key technique to satisfy the requirements of different classes of services simultaneously.

B. NOVELTY AND CONTRIBUTION
Previous works that studied MIMO schemes for massive connectivity normally assume that the number of active devices is lower than the number of receive antennas, which might not be true in mMTC use cases for beyond-5G and 6G networks where data aggregation is employed. 1 Besides, works that studied dynamic-ordering SIC decoding schemes considered systems with a very limited number of NOMA users, while assuming a singe-antenna receiver. In this work we propose and evaluate the performance of different SIC-aided iterative linear receivers that can decode multiple overlapping signals in the uplink when the number of transmitting devices may be higher than the number of receive antennas at the BS. Such decoders can perform dynamic ordering SIC decoding for arbitrary numbers of active devices and receive antennas.
First, we study linear MIMO receivers that have been extensively studied in the literature, i.e. MRC, ZF and MMSE receivers. Next, we introduce SIC and PIC decoding mechanisms that enhance the performance of the linear receivers by combining MIMO and NOMA techniques. We also present the dynamic SIC/PIC ordering scheme based on the decreasing order of Signal-to-Noise Ratios (SNRs) of the active devices. We then introduce the main novelty of this paper: a set of iterative linear receivers that combine the linear data detection with SIC and PIC and that can be utilized to decode multiple overlapping signals even in the case where the number of active users is higher than the number of receive antennas. Besides, the computation complexity of all the decoding schemes studied in this paper is evaluated in terms of the big-O notation. Resorting to Monte Carlo simulations, the performance of all the schemes is compared in terms of outage probability, average number of matrix inversions and average number of SIC operations in two different scenarios: first, when the number of active users equals the number of receiving antennas, and then when the number of active users is higher than the number of receiving antennas. The impact of imperfect CSI is considered throughout the paper. The results reveal that the iterative linear receiver that utilizes the MMSE filter for data detection with PIC decoding presents the best performance in terms of outage probability among all the receivers studied in this paper, while at the same time reducing the required number of complex signal processing operations such as matrix inversions.
The rest of this paper is organized as follows. In Section II, we present the system model. In Section III, we describe the SIC decoding procedures, while in Section IV, we introduce the iterative SIC-aided decoding schemes. The computational complexity of the different schemes is presented in Section V. Numerical performance results based on Monte Carlo simulations are presented and discussed in Section VI. Finally, we draw the main conclusions of this work in Section VII.
Notation: lowercase bold face letters denote column vectors, while boldface upper case letters denote matrices. a i is the i-th element of the column vector a, while A i is the i-th row of the matrix A.
[A] ij is the i-th row, j-th column element of the matrix A. I M is the identity matrix with size M × M . The superscripts (·) T and (·) H denote the transpose and the conjugate transpose operators, respectively. The magnitude of a scalar quantity or the cardinality of a set is denoted by | · |. The Euclidean norm of a vector is denoted by · . The circularly symmetric complex Gaussian distribution with mean a and covariance B is denoted by CN (a, B). The list of acronyms is presented in Table 1.

II. SYSTEM MODEL
We consider a scenario where K single antenna MTC devices transmit independent packets and are randomly distributed around a common multi-antenna receiver as illustrated in Fig. 1. We assume that neighboring receivers and their corresponding served devices are located far enough away (or in different indoor locations), and/or they transmit on different radio resource blocks, such that the interference they cause to each other is negligible. The K MTC devices share the same radio resource composed of one time slot in a single frequency channel. We assume quasi-static fading, where the coherence time interval is greater than the time slot. We assume that the number of active devices in a given time slot may be greater than the number N of antennas at the receiver, i.e., K > N .
In this paper, we only study the data transmission phase. We assume that there was a previous phase in which the active devices were admitted to the network, e.g., by using some grant-free random access scheme [26], [27] or fast-uplink grant protocol [28]. The packets transmitted by the devices are composed of two parts: the first part is the pilot sequence used for channel estimation and activity detection, and the second part is the payload. We assume the length of the pilot sequence to be equal or greater than the number of devices, thus each device is assigned an orthogonal pilot sequence.
The N × 1 baseband received signal vector is given by where P ∈ R K ×K is a diagonal matrix containing the average transmit SNRs of the K MTC devices, [P] kk = p k , G ∈ C N ×K is the matrix of channel gains between the K devices and the BS, x ∈ C K ×1 is the vector of simultaneously transmitted symbols by the K devices, and w ∈ C N ×1 is the vector of Additive White Gaussian Noise (AWGN) samples at the BS, such that w ∼ CN (0 N ×1 , I N ) (that is, we assume that the noise power is normalized to one). Following the Shannon capacity framework, the symbols √ p k x k transmitted by the k-th MTC device follow a Gaussian distribution with variance p k .

A. CHANNEL MODEL
The wireless channel gain from the k-th device to the n-th receive antenna is given by the n-row k-column element of G as g nk = √ β k h nk , where β k < 1 corresponds to the power attenuation due to distance, and the wireless channel coefficient h nk is independent and identically distributed (i.i.d.), following a complex Gaussian distribution with zero mean and unit variance, i.e., h nk ∼ CN (0, 1). Thus, the matrix of channel gains is where H is the N × K matrix of wireless channel coefficients between the K devices and the receiver antennas, We assume that the large-scale fading coefficients β k ∀k are known by the receiver. In order to avoid unnecessary power consumption and the near-far effect, the system adopts a channel inversion power control such that all devices have the same per antenna average received ρ [29], thus By considering the presence of channel estimation errors, the estimated N × K channel matrix between the K devices and the receiver can be rewritten aŝ whereG ∈ C N ×K corresponds to the matrix of estimation errors, whose entries are i.i.d. and follow a complex Gaussian distribution with zero mean and variance σ 2 e , i.e., [G] i,j ∼ CN (0, σ 2 e ). The receiver utilizes a Maximum Likelihood (ML) channel estimation algorithm, in which the channel estimates are obtained by using pilot sequences that are known to the receiver. In order to guarantee that all the pilot sequences are orthogonal to each other, their length must be at least equal to the number of served devices, i.e., L ≥ K . The variance of the channel estimation errors associated to the channel estimates of the k-th device is [30]

B. LINEAR FILTERS FOR INFORMATION DECODING
LetÂ be an N × K linear detector matrix at the receiver that is a function of the matrix of estimated channel gainsĜ, i.e.,Â = f (Ĝ). In [13], the authors studied the performance of three different linear decoders assuming perfect CSI: MRC, ZF and MMSE. The corresponding linear detector matrices considering imperfect CSI arê VOLUME 10, 2022 FIGURE 1. The considered uplink scenario when K > N. Two gateways, which are located in the coverage area of a common BS, are equipped with N antennas and serve K active MTC devices each.
Since (7) and (8) contain matrix inversions, the default ZF and MMSE decoders can be utilized only when K ≤ N , which makes them unsuitable for crowded mMTC scenarios where the number of active devices may be higher than the number of receive antennas. On the other hand, the MRC decoder can be utilized for any values of K and N .
Then, the received signal after linear detection employing MRC, or ZF/MMSE when K ≤ N , is split into K streams and given by Let r k and x k denote the k-th element of the K × 1 vectors r and x, respectively. Then, we have whereâ k , g k ,ĝ k andg k denote the k-th column of the matricesÂ, G,Ĝ andG, respectively. The first term in (10) corresponds to the desired signal from the k-th device, while the remaining terms correspond to interference from other devices and noise.

III. SIC AND PIC DECODING
In order to enhance the performance of the system when multiple devices are active simultaneously, NOMA with SIC or PIC decoding can be utilized [11]. In the case of SIC decoding, the K active devices are first ordered according to some criterion. Then, at each decoding step, the receiver attempts to decode the signal transmitted by one of the active devices while treating the other devices as interference and, if the decoding is successful, the interference from this device is subtracted from the received signal. The only difference of the PIC decoding procedure is that the receiver attempts to decode and remove the signal contribution of multiple active devices at a single decoding step. Let us consider a step of the decoding procedure when the k-th device is being decoded. The devices with indices {1, . . . , k − 1} have been correctly decoded and their corresponding signal contribution has been subtracted from the composite received signal. However, since the receiver has imperfect CSI, their signal contribution was subtracted based on their estimated channel vectors, not their actual realizations. As a consequence, there is still a residual interference from them on the composite received signal, which depends on the CSI errors. At this step of the SIC decoding procedure, we have The first summation corresponds to the interference of the devices waiting to be decoded, while the second summation corresponds to the residual interference from the devices that have been correctly decoded. Note that the interference due to imperfect CSI is cumulative, i.e., it continuously increases during the SIC decoding procedure. The post-processing SINR at the receiver while decoding the signal from the k-th device, including the effect of imperfect CSI, is given by The signal transmitted by the k-th device is correctly decoded if log 2 (1+γ k ) ≥ R, where R is the target data rate in bits/s/Hz. The SIC decoding procedure ends after all the K devices have already been correctly decoded or if no devices are correctly decoded at a decoding step. A detailed description 46812 VOLUME 10, 2022 end if 16: end while of the SIC/PIC decoding procedure algorithm is listed in Algorithm 1. In this general algorithm, SIC decoding can be interpreted as special subcase of PIC decoding, such that only one signal is decoded, reconstruted and subtracted in each iteration.
In the following subsections, we present the two different SIC decoding ordering schemes considered in this paper: the random ordering, and the dynamic ordering based on the instantaneous SNRs.

A. RANDOM SIC DECODING ORDERING
Under this scheme, the signals transmitted by all the active devices in a time slot are first buffered. Then, the receiver selects a signal randomly and attempts to decode it. If the decoding is successful, the signal contribution is subtracted from the composite received signal, the receiver selects randomly the next signal to be decoded, and so on. The SIC decoding procedure ends if one decoding step fails or after all the signals have been correctly decoded.
Note that, when the system adopts a power control such that all devices have the same average received SNR, the performance achieved with the random SIC decoding ordering scheme is exactly the same performance achieved with a fixed SIC decoding ordering. In the latter case, the SIC decoding ordering would be pre-defined during the admission phase of the active devices to the network, and it would be the same during all the time slots required for the data transmission.

B. DYNAMIC-ORDERING SIC DECODING
This scheme has been studied in the literature, e.g. in [18] and [20]. In each time slot and before decoding the signals from the active devices, the receiver computes their instantaneous received SNR. Then, the devices are sorted in the decreasing order of SNRs, and the ordering does not change during the SIC decoding procedure. The instantaneous SNR of the k-th user is By adopting the SNR as the SIC decoding ordering criterion, the K active devices waiting to be decoded are ordered according to their estimated channel vectors aŝ As pointed out by [20], the dynamic-ordering SIC decoding does not introduce large computational burden or large system delay. The only computation overhead is computing and ordering the instantaneous SNRs of the active devices in each time slot.

IV. ITERATIVE SIC/PIC-AIDED DECODING SCHEMES
Next, we describe different iterative SIC/PIC-aided decoding strategies that can be adopted when the number of devices is greater than the number of receive antenna elements, i.e., K > N .

A. MRC-SIC
First, the K users are ordered according to some criterion, as described in Section III. Then, the receiver attempts to decode the signal transmitted by the strongest device with the MRC reception filter, which is equivalent to useâ k =ĝ k in (12). If the decoding succeeds, its interference is subtracted from the received signal, the receiver proceeds to the decoding of the second strongest device, and so on. The procedure ends if any decoding attempt fails. The functional block diagram of the MRC-SIC receiver is shown in Fig. 2a.

B. ZF-PIC AND MMSE-PIC
The ZF-SIC and MMSE-SIC receivers allow the decoding and subtraction of the signal contribution of only one user in each iteration. After the signal is correctly decoded and the SIC operation is performed, a new ZF or MMSE filter needs to be computed. As a consequence, such implementations of ZF-SIC and MMSE-SIC require a significant amount of relatively complex operations, such as matrix inversions, which grow exponentially with the number of receive antennas.
In this paper we propose novel ZF-PIC and MMSE-PIC receivers that require a reduced number of complex signal operations such as matrix inversions and that can still be utilized in the case of K > N . More specifically, under our approach, multiple users can be decoded simultaneously at each decoding step, which significantly reduces the required number of ZF/MMSE operations, and the interference from all the correctly decoded devices is also subtracted at the the same decoding step. As pointed out by [8] and [11], SIC works better if the received powers of all the devices are different, but PIC outperforms SIC when the received powers of the devices are similar, which is the case studied in this work. LetĜ 46814 VOLUME 10, 2022 denote the N × K matrix containing the estimated channel vectors ordered according to a certain criterion, as those described in Section III. Herein, we assume that K > N . Besides, letĜ denote the N × N matrix containing the channel vectors of only the N strongest devices. This matrix is updated at every decoding step. The receiver applies (16) in (7) or (8) at every decoding step to compute the new ZF or MMSE decoding matrix, respectively. Next, it attempts to decode the N strongest devices simultaneously. LetÑ denote the number of users correctly decoded in a given iteration. After each ZF or MMSE operation, one of the following options can happen: • IfÑ = N , the interference from the decoded devices is subtracted from the received signal, the receiver updates the matrixĜ str with the channel vectors of the next N devices, and proceeds to the next iteration; • If 0 <Ñ < N devices are correctly decoded, their interference is subtracted from the received signal, and their corresponding signals contribution is removed from the matrixĜ str . Then, if (K −Ñ ) ≥ N , the channel vectors corresponding to the next N −Ñ devices are concatenated to the matrix in order to obtain a new N ×N matrix. Otherwise, if (K −Ñ ) < N , the new matrixĜ str has dimensions N × (K −Ñ ). The receiver then proceeds to the next iteration; • IfÑ = 0 or if there is no remaining devices waiting to be decoded, the procedure ends. On every decoding step, the SINR of the k-th device is given by (12), where the columnsâ k of the linear detector matrixÂ are calculated in every iteration based on (16).
The functional block diagram of the ZF-PIC and MMSE-PIC receivers is illustrated in Fig. 2b.

C. MRC-SIC/ZF AND MRC-SIC/MMSE
At the first step of the SIC decoding procedure, the K users are ordered accordingly to one of the schemes presented in Section III. Next, MRC-SIC is performed to decode the devices until the number of remaining devices equals the number of receive antenna elements. When this condition is satisfied, a traditional ZF or MMSE receiver attempts to decode the remaining N devices. The functional block diagram of the MRC-SIC/ZF and MRC-SIC/MMSE receivers is shown in Fig. 2c.

V. COMPUTATIONAL COMPLEXITY OF THE SCHEMES
In this section, we compare the computational complexity of the different receivers in the asymptotic case where all the signals transmitted by the K devices are correctly decoded. First, we list the complexity of all the signal processing operations that can be performed by the receivers:  • Matrix inversion has complexity O(N 3 ). Note that the matrix products and inversions are the most expensive operation in terms of computational complexity, thus their number of occurrences should be minimized.
In Tables 2 and 3, we list the average number of matrix inversions and SIC operations required by the different schemes for the cases of K = N and K > N , respectively. Note that, in the asymptotic case for K = N , we assume that all the signals transmitted by the K devices are successfully decoded in the first ZF/MMSE operation, thus no PIC operation needs to be performed. Even though the MRC-SIC does not perform any matrix inversion, its computation complexity approaches the complexity of receivers that utilize matrix inversions owing to the increased number of iterations. Nevertheless, for the case of K > N , the ZF-PIC and MMSE-PIC receivers proposed in this work reduces the required number of matrix inversions when compared to the other receiver schemes. This is because they decode and subtract the signal contribution of multiple users in the same iteration.

VI. NUMERICAL RESULTS
We resort to Monte Carlo simulations to evaluate the performance of the considered decoding schemes. The performance is evaluated in terms of outage probability, average number of matrix operations and average number of SIC operations. The receivers that perform SIC or PIC decoding adopt the dynamic-ordering SIC decoding.
Owing to the iterative behaviour of the studied non-linear receivers and assuming that the number of active devices in a given time slot can be considerable high, it is mathematically intractable to derive closed-form expressions for the outage probability, average number of matrix inversions and average number of SIC operations. That is the reason why we resorted to computer simulation to generate the numerical results presented on this section.
Each Monte Carlo simulation was performed as follows. First, we define the simulation parameters N , K , r and ρ. We then generate the matrix of wireless channel vectors for several Monte Carlo simulations. Next, we employ one of the receivers studied on this paper for each realization of the channel vectors. After each realization, store the number of correctly decoded devices and the numbers of SIC operations and matrix inversions performed by the receiver. Finally, we compute the outage probability based on the average number of correctly decoded devices, and also the average number of SIC operations and matrix inversions.
Moreover, let D denote the number of correctly decoded signals at each time slot. Then, the outage probability is A. PERFORMANCE FOR K = N We first evaluate the case where the number of receive antennas is equal to the number of devices connected to the BS. We set N = K = 8 and r = 1 bits/s/Hz. We observe in Fig. 3 that the best outage performance is achieved by the MMSE-PIC receiver. The second best performance is achieved by the MMSE receiver without interference cancellation, which is even better than the performance of the iterative ZF-PIC receiver. We note that the performance of the MRC-SIC receiver achieves an error floor for ρ ≥ 10 dB, i.e., increasing the transmit power of the devices after this point does not yield performance gains. The ZF receiver without interference cancellation only outperforms the MRC-SIC receiver on high transmit SNRs, while the MRC receiver performs poorly in the whole range of transmit SNRs. Note that the receivers with MRC and ZF filters are more robust against imperfect CSI than the ones that utilize the MMSE filter, specially in the high SNR regime. We compare the average number of matrix inversions in Fig. 3b. The MRC and MRC-SIC receivers do not perform any matrix inversion, and the ZF and MMSE receivers perform only a single matrix inversion to decode the signals from K devices. For the ZF-PIC and MMSE-PIC receivers, the average number of matrix inversions first increases until achieving a peak value, and then decreases and tends to one as the transmit SNR increases. The reason for this is the fact that, for the lower values of ρ, the ZF-PIC and MMSE-PIC have to perform more iterations when attempting to decode the signals from the K users. As ρ increases, all the K signals are successfully decoded with a single iteration.
In Fig. 3c, we compare the average number of SIC operations for the different receivers. The MRC, ZF and MMSE receivers do not perform any SIC operation. The number of SIC operations performed by the MRC-SIC receiver tends to K as the transmit SNR increases, since one SIC operation is performed for each decoded signal. Finally, the average number of SIC operations performed by the ZF-PIC and MMSE-PIC receivers is greater than one for almost the whole range of ρ, since more of one iteration is required to decode the signals from all the K devices. Nevertheless, as ρ grows high, only one iteration is enough to decode all the K streams, thus only a single SIC operation is required.

B. PERFORMANCE FOR K > N
The performance for the case where the number of connected devices is greater than the number of receive antennas, i.e., K > N , is shown in Figs. 4a-4c. We set N = 8, K = 20 and r = 0.5 bits/s/Hz. In Fig. 4a, we compare the outage performance for the different receivers. Surprisingly, the ZF-PIC receiver performs extremely poorly in this setup, followed by the MRC-SIC receiver. The performance of the MRC-SIC/MMSE receiver matches the performance of the MRC-SIC, and both outperform the MRC-SIC/ZF receiver in the whole considered range of ρ. The performance of the MRC-SIC/ZF only matches the performance of the others for ρ > 20 dB. We also observe that the performance of the MRC-SIC, MRC-SIC/ZF and MRC-SIC/MMSE saturates for ρ > 10 dB, i.e., increasing the transmit SNR does not yield any performance gain. The reason is that we increase the interference that the devices cause to each other when we increase their average received SNR.
Finally, the most interesting result is related to the MMSE-PIC receiver. Even though the MMSE-PIC receiver is the one that presents the highest performance degradation owing to imperfect CSI, it is still the one that is able to achieve the lowest outage probability. When ρ is very low, the received signal strength of the devices is very low, thus most of the signals transmitted by the devices are not correctly decoded. Then, as ρ increases, the outage probability achieves a minimum value around ρ = −2 dB. In this point, its performance is much better than the performance achieved by any of the other receivers in the whole range of ρ. However, if we continue increasing ρ, the interference caused by the active devices waiting to be decoded become too high, and as a consequence the performance of the MMSE-PIC receiver deteriorates rapidly as ρ grows, which can be noted by the increase in the outage probability for larger ρ. Nonetheless, the curves of the MMSE-PIC evinces the fact that there is an optimal value for the average received SNR ρ such that the received signal strength is strong enough to make sure that all the active devices are correctly decoded (thus guaranteeing minimum outage probability), but not too strong such that the interference caused by the devices waiting to be decoded does not harm the performance of the system. How to determine the optimal value of ρ for the power control in terms of the system parameters N , K and ρ is an issue that we intend to investigate in future works.
In Fig. 4b, we analyze the average number of matrix inversions for all the considered receivers. The MRC and MRC-SIC do not perform any matrix inversion. The MRC-SIC/ZF and MRC-SIC/MMSE perform on average one matrix inversion, because only a single ZF and MMSE operation is necessary at the last step of the SIC decoding procedure to decode the signals from the last N devices. The ZF-PIC receiver also performs approximately one matrix inversion, but for the completely opposite reason: the SIC decoding procedure always fails and is interrupted on the first iteration. The MMSE-PIC receiver performs multiple matrix inversions in the range where the outage probability is low, because multiple iterations are performed to decode the signals from the K devices. However, as ρ grows large and the outage probability tends to one, the average number of matrix inversions also tends to one because the SIC decoding procedure fails and is interrupted on the first iteration.
We finally compare the average number of SIC operations in Fig. 4c. The MRC receiver does not perform SIC operations. The number of SIC operations performed by the ZF-PIC is also close to zero, but due to the fact that the SIC decoding procedure tends to fail and be interrupted on the first iteration. The number of SIC operations performed by the MRC-SIC tends to K as ρ is increased, while the number of SIC operations performed by the MRC-SIC/ZF and MRC-SIC/MMSE receivers tends to K − N (after K − N devices are successfully decoded, a ZF or MMSE operation is performed to decode the remaining N users). The MMSE-PIC performs a reduced number of SIC operations because multiple users are simultaneously decoded and their interference is jointly subtracted from the received signal at each iteration.

C. SIC VERSUS PIC
In this subsection, we present a performance comparison between the receivers that utilize SIC and PIC decoding schemes. More specifically, we compare the ZF-SIC and MMSE-SIC studied in the literature, which decode and subtract the signal contribution of only a single user in a given iteration, and the ZF-PIC and MMSE-PIC receivers proposed in this work, which can decode and subtract the signal contribution of multiple users in a given iteration. Similarly to the previous subsection, we present Monte Carlo simulation results for the case of K = N and for the case of K > N . Besides, all the receivers perform the dynamic-ordering SIC decoding based on the SNRs.
In Figs. 5a and 6b, we compare the performance of the receivers in terms of outage probability. We observe that the iterative filters that utilize the MMSE filter outperform the ones that utilize the ZF filter. We also observe that the receivers that utilize PIC decoding outperform their counterparts that utilize SIC. This happens because, in the case of SIC decoding, when the decoding attempt of a signal transmitted by one device fails, the signals transmitted by other devices that are waiting to be decoded are also not decoded, which increases the error probability. As in the previous subsection, we note once more that the receivers that utilize ZF filters present a lower performance degradation owing to imperfect CSI than the ones that utilize MMSE filter. Nevertheless, for the case of K > N , both ZF-SIC, ZF-PIC and MMSE-SIC perform very poorly, and the MMSE-PIC is the only one able to achieve a satisfactory performance.
In Figs. 5b and 6b, we analyse the average number of matrix inversions performed by the receivers. For K = N (Fig. 5b), we note that the ZF-SIC and MMSE-SIC tend to perform K matrix inversions as ρ increases. In order to correctly decode the signals transmitted by the K devices, a different ZF/MMSE filter is computed in each iteration of the SIC decoding procedure. On the other hand, when PIC is employed and when ρ grows large, only a single matrix inversion is necessary. In the high SNR regime, only one ZF/MMSE filter needs to be computed, since all devices are correctly decoded in the first iteration of the PIC procedure. When we observe the results for the case of K > N in Fig. 6b, we observe that the MMSE-SIC performs a very high number of matrix inversions, but it still performs poorly since it is not able to decode the signals transmitted by all the active devices. The MMSE-PIC performs a reduced number of matrix inversions, and still achieves the best performance for ρ ≈ 2 dB. Note that the ZF-SIC and ZF-PIC performs only a single matrix operation because the decoding procedure fails on the first iteration.
Finally, we compare the average number of SIC operations performed by the receivers in Figs. 6b and 6c. For the case of K = N and when ρ grows large (Fig. 6b), we note that the ZF-SIC and MMSE-SIC tend to perform K − 1 SIC operations, as expected. On the other hand, the ZF-PIC and MMSE-PIC does not need to perform any signal subtraction in the high SNR regime, since all users are correctly decoded in the first iteration of the PIC procedure. However, when we analyze the case of K > N in Fig. 6c, we note that the MMSE-SIC and MMSE-PIC perform several iterations when attempting to decode the signals transmitted by the K active devices, thus many signal subtractions are performed. In this situation, the ZF-SIC and ZF-PIC present a very poor performance: the decoding procedure fails on the first iterations, thus almost no signal subtraction operation is performed.
We conclude that the ZF-PIC and MMSE-PIC present a better performance than their counterparts ZF-SIC and MMSE-PIC, and at the same time reduce the number of required complex signal operations such as matrix inversions and subtractions. However, the only receiver suitable for the case of K > N is the MMSE-PIC.

D. FIXED VERSUS DYNAMIC-ORDERING SIC DECODING
In this subsection, we present results to compare the performance of the random and dynamic-ordering SIC decoding schemes. We only evaluate the performance of the MRC-SIC and MMSE-PIC receivers, since they presented the best performance among all the receivers studied in this work.
In Fig. 7a, we consider the case where the number of users is equal to the number of receive antennas, i.e., N = K . When VOLUME 10, 2022 we analyze the performance of the MRC-SIC receiver, for both perfect and imperfect CSI, we note that the dynamic ordering based on the SNR yields a significant performance gain. On the other hand, by analyzing the results of the MMSE-PIC receiver, we note that the performance obtained with the dynamic ordering matches the performance obtained with the random one. Nonetheless, the MMSE-PIC receiver outperforms the MRC-SIC receiver in the whole range of ρ. In this considered scenario, the best choice is then the MMSE-PIC receiver with dynamic ordering, which presents the higher complexity but the best performance.
In Fig. 7b, we consider the case where the number of users is greater than the number of receive antennas, i.e., K > N . In this scenario, we observe that the MRC-SIC receiver with random ordering performs extremely poorly, and that the dynamic ordering based on the SNR improves significantly its performance. On the other hand, for the MMSE-PIC receiver, the ordering based on the SINR yields only a very small performance gain when compared to the ordering based on the SNR. For both the cases of perfect and imperfect CSI, the performance of the MMSE-PIC receiver is very close to (only slight better than) the performance of the MRC-SIC receiver bellow ρ = −2 dB. By increasing ρ from this point, the performance of the MMSE-PIC receiver deteriorates rapidly, such that the outage probability approaches one as ρ grows large. In this considered scenario, the best choice is the MMSE-PIC receiver with the dynamic SIC ordering.

VII. CONCLUSION
In this work, we proposed and analyzed iterative linear receivers that can be adopted in MTC use cases where the number of active devices in a give time slot is higher than the number of receive antennas. Aiming at achieving more realistic results, we took in considerations the effects of imperfect CSI. The proposed schemes are able to perform dynamic-ordering SIC decoding and data detection for arbitrary numbers of active devices and receive antennas, thus they may be adopted in mMTC scenarios where, because of cost, physical size and power limitations, receivers cannot be equipped with a large number of receive antennas. Some of the main takeaways are listed bellow: • The receivers that utilize the MMSE filter perform better than their counterparts that utilize the ZF filter. The MMSE receiver without interference cancellation can even outperform the ZF-PIC receiver for K = N .
• The receivers that utilize MRC or ZF filters present a lower performance degradation than the ones that utilize MMSE filter.
• Even though the MMSE-PIC receiver is the one that presents the highest performance degradation owing to imperfect CSI, it is the one that presented the best performance in all the considered scenarios • The ZF-PIC and MMSE-PIC present a better performance in terms of outage probability when compared to their counterparts ZF-SIC and MMSE-SIC, and at the same time reduce the required number of complex signal processing operations such as matrix inversions.
• The dynamic-ordering SIC decoding enhances significantly the performance of the SIC and PIC receivers.