Novel Multilayer Extreme Learning Machine as a Massive MIMO Receiver for Millimeter Wave Communications

Wireless communication systems working in millimeter-wave (mmWave) frequency bands offer higher bandwidths than traditional radio frequency schemes. This technology allows multibeam steering and data multiplexing with the help of massive multiple-input multiple-output (MIMO) systems. However, supporting large bandwidths at mmWave frequencies is challenging due to the use of large antenna arrays with beamforming, sampling signals with large bandwidths, and baseband signal processing operations at gigabit data rates. Due to the wider bandwidth and higher signal processing requirements of mmWave systems, low-complexity receiver algorithms become important. Previously reported investigations assumed the use of hybrid beamforming structures that reduce power consumption and signal processing tasks. Therefore, the use of artificial neural networks (ANNs) becomes relevant for the processing of mmWave signals as reported in earlier works. In this article, to carry out MIMO combining processing for mmWave communications, we propose a fully complex multilayer extreme learning machine (M-ELM) neural network. We investigate the tuning of the number of neurons in each hidden layer for the proposed method to maximize the system performance and decrease the complexity of the receiver. We compare the results of the introduced M-ELM algorithm with a fully complex extreme learning machine (ELM), fully real ELM, and M-ELM defined in the real plane in terms of spectral efficiency, bit error rate, computational complexity, and processing time. Furthermore, we compare the novel M-ELM strategy with traditional linear MIMO receivers, such as Maximum Ratio and Minimum Mean Square Error, as well as to a multilayer perceptron (MLP) neural network trained offline. The numerical results show that with a good balance between the overall performance and computational cost of the ANN, the fully complex M-ELM MIMO receiver outperforms the other evaluated schemes.


I. INTRODUCTION
Millimeter (mmWave) communication systems have received wide public attention as part of the development of the The associate editor coordinating the review of this manuscript and approving it for publication was Wei Liu. fifth generation (5G) New Radio (NR) technology, which provides low latency, more bandwidth, and higher data rates [1]. The mmWave massive multiple input, multiple output (MIMO) system can enhance the achievable Spectral Efficiency (SE) by providing a large array gain. However, VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ the high-resolution Analog-to-Digital (ADC) and Digitalto-Analog (DAC) converters and the fully digital precoding scheme result in high power consumption and unaffordable hardware costs for these systems [2]. Furthermore, due to the baseband signal processing at gigabit data rates, support for large bandwidths at mmWave frequencies is challenging. However, due to short wavelengths, mmWave communications suffer from fundamental technical challenges, such as severe path loss, blockage sensitivity, directivity, energy efficiency, and narrow beamwidth [3]. Essentially, Uplink Transmission (UL) in the mmWave MIMO system presents these challenges because of the low transmission power of mobile devices. Another related problem in UL communication is interference between users, which reduces the SE of the transmission. Therefore, it is relevant to obtain a solution that maximizes the SE for mmWave massive MIMO systems with low computational complexity. Typical signal processing methods, such as the maximum ratio (MR) and minimum mean square error (MMSE) algorithms used in MIMO receivers, are not optimized to reduce computational complexity and communications latency. Nevertheless, Machine Learning (ML) and Deep Learning (DL) techniques can effectively address these challenges [4]- [8]. ML and DL methods leverage prior observations about channel estimation and precoding designs to solve the MIMO combining task for mmWave systems, significantly reducing the processing overhead. However, and as stated in [9], not all DL techniques can outperform known optimal solutions for wireless communications. As reviewed in [10], a DL technique like a trained Multilayer Perceptron (MLP) neural network cannot outperform algorithms like the MMSE receiver, from which the neural network learns the MIMO combining process, since the bit error rate (BER) is the same for both techniques. Thus, [9], [10], ML and DL may present improvements compared to known algorithms in communications, particularly in reducing the computational complexity, such as the extreme learning strategy described in [11]. Additionally, neural networks can effectively reduce the effects of non-linear hardware, interference, and frequency-selective channels [9].
Since channel estimation is relevant in conventional receiver designs, ML and DL algorithms are used to improve channel estimation efficiency and communication quality [12]- [14]. In addition, many studies have investigated the issue of joint channel estimation and signal detection [15]. Indeed, channel estimation is not an isolated process since channel equalization requires precise Channel State Information (CSI) to effectively remove signal distortions [16]. As a result, channel estimation and channel equalization can be performed together, and even symbol detection and channel decoding can be done together in one step. However, symbol detection using DL methods showed lower performance than with regular methods [17]. Furthermore, processing time should be considered, as several methods can achieve good wireless performance at the expense of more computational complexity. Even DL techniques can take a lot of computational resources to learn specific tasks [15], [18].
Another important aspect of the ML and DL methods is that good data sets are required. In this study, the data set becomes the massive MIMO mmWave channels. Channel modeling is a hard task since there is the need to predict large-and small-scale propagation channel parameters with a limited amount of measurement data, especially for mmWave systems [10], [19], [20]. Several studies validate that ELM-based algorithms outperform deep learning neural networks in terms of computational cost for classification or regression issues [21]- [23].
In this paper, we propose a fully complex multilayer extreme learning machine (M-ELM) algorithm in order to learn the MIMO combining process to improve the SE of mmWave communications and minimize the use of computational resources.
In contrast to conventional MIMO receivers such as MR and MMSE, the M-ELM strategy does not require the CSI through channel estimation, which simplifies processing time, particularly useful for mmWave communications. This investigation introduces an M-ELM that learns feature representations by employing singular values based on autoencoders (AEs), strictly defined in the complex domain.

A. RELATED WORK
The Extreme Learning Machine (ELM), introduced by Huang in 2004 [24], comes to be a novel algorithm for optimizing the Single-Hidden-Layer Feedforward Networks (SLFN) and covers the disadvantages of the gradient descent method. In the ELM algorithm, the input weights and bias of the hidden layer of the SLFN are arbitrarily derived without several iterations to obtain the optimal value and remain unchanged during the training process [25]. In ELM approaches, the hidden layer does not need to be adjusted and is highlighted to solve the problem of generalization, since the global optimum is theoretically displayed. Moreover, setting hyperparameters takes a lot of time. In addition, the ELM is widely used in many technological areas, especially in telecommunications signal reception, due to the following: (i) Easy implementation, (ii) extremely fast training speed, and (iii) good generalization performance [11], [16], [26]- [28].
The work in [24] is the first to introduce the ELM concept as an SLFN network with a fast-learning speed thanks to the random adoption of input weights and biases and the minimum norm least-squares (LS) solution of the SLFN. The authors conclude that the standard ELM can be used generally in many cases, and this is the case for the work in [29] which investigates the problem of jointly solving equalization and symbol detection in orthogonal frequency division multiplexing (OFDM) systems with quadrature amplitude modulation (QAM). This study introduces the performance of the standard ELM with fast training speeds compared to other based methods. However, research is limited to single-input single-output (SISO) systems, as in works [10], [30], [31]. 58966 VOLUME 10, 2022 Nevertheless, the standard ELM requires setting a minimum number of neurons in the single hidden layer to achieve good performance. Therefore, for SISO systems, an ELM strategy can achieve good results, but processing time should also be considered [10].
Massive MIMO systems are a fundamental technology used for beamforming in mmWave systems. The work in [27] introduces an ELM receiver for MIMO light-emitting diode (LED) communications with nonlinearities and cross-LED interference. The work in [28] shows the performance of a standard ELM used as a MIMO-OFDM receiver, considering low-resolution ADCs. The investigation shows a MIMO combining process learned online with the help of reference signals. However, this study is limited to single-user scenarios. The work in [26] considers multiuser scenarios where the inter-user interference is the limiting factor. Finally, the studies in [26], [28] are limited to low-band frequencies.
With traditional methods such as MR and MMSE, it is necessary to estimate the MIMO channel and then find the MIMO combiner that performs channel equalization [16], [32]- [34]. On the contrary, with an ELM strategy, there is no need to perform channel estimation, as MIMO combining processing can be performed directly with this neural network as shown in [26] for low-band frequencies and in [11] for mmWave frequencies. However, these studies are limited to the single-layer ELM method. The work in [10] compares the performance of the ELM method with linear MIMO receivers and a DL approach called multilayer perceptron (MLP) considering mmWave band frequencies. However, the study is limited to SISO systems and a single-layer ELM strategy.
Since deep neural networks (DNNs) only work with real numbers, there is a need to separate the complex input for a DNN into real and imaginary parts. This process represents a limitation for DNNs, but not for an ELM strategy. The advantage of the latter is the fast-learning speed and simplicity since this neural network can work with complex input numbers such as the received OFDM symbols in a 5G mobile communication. Since an OFDM communication system requires processing complex numbers, an ELM strategy is applied directly to learn the MIMO processing task without requiring a real domain input. However, the investigation in [35] takes into account the standard ELM in the real domain.
An M-ELM network not only inherits the characteristics of the hyperparameters of a single layer ELM [36] but also achieves better performance [37]. However, previous studies outline the performance of real domain multi-layer ELM networks such as the works in [25], [38]- [40] for tasks that are not related to MIMO processing. In this article, we consider a complex M-ELM neural network to solve the MIMO combining problem. We compare the performance of this complex domain network to the real domain M-ELM, especially in terms of processing time.

B. RESEARCH CONTRIBUTIONS
The contributions of this work are summarized as follows: 1) We have designed an M-ELM network to perform MIMO processing of received signals with reduced computational complexity compared to a single-layer ELM. We have considered a single-cell, multiuser mmWave system, and no perfect CSI at the receiver. Specifically, there is no channel estimation process performed with the M-ELM strategy. 2) We have provided an analysis of the introduced M-ELM method by tuning the hyperparameters of the network, such as the number of hidden layers and the number of neurons per layer, to perform MIMO combining processing in one step, in order to reduce the processing time compared to a single-layer strategy. 3) We have tested the performance of the novel M-ELM receiver in terms of the achievable SE, bit error rate (BER), and average processing time. The performance of the M-ELM strategy was compared to that of the single-layer ELM method for the complex and real domains, as well as the MR and MMSE MIMO receivers.
The remainder of the paper is organized as follows. Section II presents the signal and channel model, as well as the channel estimation and equalization processes. The proposed M-ELM algorithm is introduced in Section III. The simulation results are provided in Section IV. Section V presents an analysis and discussion of the results. Finally, Section VI outlines the conclusions of this investigation.
Notation: Scalars are denoted in the lower case, whereas matrices and vectors are represented in bold upper and lower case, respectively. For any general matrix or vector, x T represents the transpose and x * the conjugate transpose. I is an identity matrix of proper dimension. . F represents the Frobenius norm operator. E [.], C, and N denotes the expected value, and the set of complex and natural numbers, respectively. Finally, a circular symmetric complex Gaussian stochastic vector is written as x ∼ CN µ x , σ 2 x with mean µ x and variance σ 2 x .

II. SYSTEM DESCRIPTION
In this section, we present the multiuser mmWave signal and channel model, as well as the channel estimation and equalization processes considered in this work.

A. SIGNAL MODEL
We considered a single cell multiuser hybrid beamforming mmWave system based on the 5G NR standard. The simulation of radio links, based on cyclic prefix orthogonal frequency division multiplexing (CP-OFDM) communication, employs s subcarriers to transmit N s data symbols per link. The radio link consists of a base station equipped with a massive array of antennas, which communicates with multi-antenna user equipment (UE), as is illustrated in Fig. 1 [10]. Fig. 1 shows that random bits (data bitstream) are mapped to data symbols modulated with the quadrature phase shift VOLUME 10, 2022 FIGURE 1. Simplified hardware block diagram of the CP-OFDM mmWave system. keying (QPSK) or quadrature amplitude modulation (QAM) schemes. The data symbols must be multiplexed with CP-OFDM to perform a wideband transmission through the mmWave channel. The multiantenna receiver performs symbol demodulation to recover the original data. Therefore, for this study, we simulate OFDM symbols which are complex numbers that are configured following the 5G NR specifications [1], [41].
OFDM modulation offers high data rate transmission over a multipath fading channel, high SE, and low-complexity implementation due to the Fast Fourier Transform (FFT) algorithm [42]. This transmission technique can also scale the number of subcarriers, so the FFT size scales such that processing complexity does not increase unnecessarily for larger bandwidths. Furthermore, OFDM can work with massive MIMO systems to achieve high antenna diversity and spatial multiplexing [11], [43].
The hardware architecture shown in Fig. 2 represents a fully connected hybrid beamforming BS at the receiver side. At BS, there are L r radio-frequency (RF) chains, whereas there are K multiantenna users, each equipped with a single RF chain [34], [44], [45].
As shown in Fig. 2, K users can transmit simultaneously to the BS by applying baseband precoding, f BB k for user k = 1, 2, . . . , K , i.e., F BB k = f BB 1 , f BB 2 , . . . , f BB K to transmit their respective signals followed by an RF precoder F RF k using analog circuitry (phase shifters). The CP-OFDM multi-carrier scheme suppresses intersymbol interference by mapping modulated symbols into N s subcarriers, in this way the transmitted signal k is precoded as where the transmitted symbol of the k th user in subcarrier s is represented as x k [s] [34], [44].
Assuming that all users transmit simultaneously in the UL to the BS, the s th subcarrier reference signal y φ k [s] ∈ C N r received in the BS and transmitted by the k th user is given by where W * RF k is the analog combining matrix for the k th user at the mmWave BS, H k [s] is the channel matrix between the BS, equipped with L r N r antennas and the k th user, each equipped with N t antennas. φ k [s] denotes the k th user transmitted reference symbol mapped at pilot subcarrier s. It is worth noting that not all subcarriers are used for pilot transmission. This is a characteristic of the pilot structure defined in the 5G NR standard, which allows multiuser multiplexing [41]. Finally, v k [s] represents the circularly symmetric white Gaussian noise vector characterized as i.i.d CN 0, σ 2 k I [34], [45]. On the other hand, the k th received data signal at the BS, y x k [s] ∈ C N r , for a frequency domain block fading channel is 58968 VOLUME 10, 2022 FIGURE 2. Simplified system blocks of an UL multiuser mmWave system with a fully-connected hybrid array of antennas.

B. CHANNEL MODEL
Since we considered a massive MIMO mmWave system, the high path loss limits spatial selectivity, and the hybrid structure at the mmWave wavelength assumes the use of a tightly packet antenna array, which leads to high antenna correlation [44]. Therefore, the mmWave massive MIMO channel is assumed to be frequency selective and is represented by the clustered channel model. The d th delay tap of the k th user discrete-time N r × N t channel matrix H d,k , for d = 1, 2, . . . , N c , assumed as the summation of N cl scattering clusters, each contributing N ray propagation paths [45], is written as where λ = N r N t N cl N ray is the scalar normalization factor, and α ηι,k is the complex small-scale fading gain of the ι th ray in the η th scattering cluster, characterized as i.i.d. CN 0, σ 2 η , where σ 2 η denotes the average power of the η th cluster. The vectors a r θ r ηι,k , φ r ηι,k and a * t θ t ηι,k , φ t ηι,k denote the array response functions, for the receive and transmit antenna arrays of the k th user, with respect to the angles of arrival φ r ηι θ r ηι,k and departure φ t ηι,k θ t ηι,k , respectively, at an azimuth θ and an elevation angle φ [46]. Finally, based on [45], [47], the channel frequency response of the user k in subcarrier s, from N c delay taps in the discrete time domain is given by To estimate the mmWave channel, we first define an effective channel that is equivalent to the convolution of the hybrid beamformer and the channel response in the frequency domain, written as By transmitting reference symbols on variously known pilot subcarriers, CSI estimation can be performed with different channel estimation methods [48]. A simple estimation technique is the least squares (LS) algorithm. Based on the evidence presented in [49], the LS estimation can result in an imprecise measurement of the CSI when noise and interference are high, given that this technique does not consider the correlation properties of the channel [16]. The LS estimator is given bŷ With the LS estimation, we can obtain first-and second-order statistics of channel parameters such as the variance of the effective estimated channel vector σ 2 h . These parameters of the mmWave channel must be known a priori to perform MMSE estimation [16], [50], [51], which is written asĥ where σ 2 k denotes the k th user noise variance at the receiver. Consider that the proposed M-ELM strategy does not require performing channel estimation. However, this process is required by linear MIMO receivers, such as the MMSE and VOLUME 10, 2022 MR receivers, in order to perform channel equalization [52], as described in the next subsection.

D. CHANNEL EQUALIZATION
In this section, we describe two common linear MIMO receivers that are frequently used to perform the channel equalization task at the receiver [32], [33].
MR processing computes the combining vector w BB MR k [s] with the estimated effective channel at the s th subcarrier of the received OFDM signal [41], h eff k [s], as follows Another MIMO receiver most studied in the literature is the MMSE combiner [16], [26], [32]- [34], [41], [44], [45], [53]. For MMSE processing, we can group the CSI of the K users in the cell of interest in a single matrix as followŝ According to [41], the MMSE combining vector, w BB MMSE k [s], can be written as Therefore, the MIMO receiver problem reduces to finding a combining vector that recovers the transmitted data symbols written aŝ where w * BB k [s] is the k th user baseband (BB) combiner that maximizes the data symbol power, forcing to zero interference signals and nonlinear distortions, whereas h eff j and x j [s] represent the effective channel and the transmitted symbol of the j th interfering user, respectively.

E. SPECTRAL EFFICIENCY
In this work, we considered the UL bound capacity to get the SE expression. Therefore, as in [52], C k [s] is defined as the corresponding estimation error covariance matrix of the k th user effective channel. This way, the k th user SE per subcarrier in the UL is given by where SINR k [s] represents the signal-to-interference plus noise ratio (SINR) of the k th user, which is written in (13), as shown at the bottom of the next page. However, SE cannot be calculated in the same way with an ELM method, since there is no estimated channel with this strategy [10], [26]. Therefore, the SINR can also be computed as where PAPR is given per modulation scheme, i.e., the PAPR value corresponds to 0 dB, 2.6 dB, 3.7 dB, and 4.2 dB for the quadrature phase shift keying (QPSK) modulation, 16-quadrature amplitude modulation (16-QAM), 64-QAM, and 256-QAM, respectively, [26]. The error vector magnitude (EVM) of the s th subcarrier of the user k, (EVM k [s]), is calculated with the combined symbols in the receiver based on the Euclidean distance between the combined symbols and their perfect constellation points.

III. EXTREME LEARNING MACHINE STRATEGIES
In this section, we introduce the design of a single-layer ELM receiver and the proposed M-ELM (both methods are strictly defined in the complex domain). It is important to mention that the former was initially presented in [26] for multiuser massive MIMO systems, as well as in [10], [11] for mmWave channels simulated at 28 GHz. In this study, the M-ELM receiver will be introduced for mmWave massive MIMO systems. The goal of the ELM strategy is to learn the MIMO combining process from the training data (in this case, the pilot symbols) that gives the best prediction when the unknown data symbols share the same modulation as the pilots [11], [26], [54]. By doing so, only the desired signal is maximized, while interference signals and non-linear distortions are attenuated. A significant advantage of the ELM neural network is that this method can perform MIMO combining online because of the training phase. which consists of the random origination of the hidden neurons, so that the output weights can be found through a regularized least squares process [55]. This means that the ELM network does not require offline training as with ANNs based on the stochastic gradient descent algorithm [26], and DNN strategies [56].
The k th ELM receiver, where the pilot signal matrix Y φ k is used at the input of the neural network, which consists of L hidden nodes between the input and output layers, is represented by φ k = O φ k β k , where β k ∈ C L is the output weight vector between the hidden layer of L nodes to the 58970 VOLUME 10, 2022 single output node, and O φ k ∈ C N s ×L represents the hidden layer output matrix, and is given by where a(·) is the complex activation function of the hidden layer, which includes circular, inverse circular, hyperbolic, and inverse hyperbolic functions, as is described in [57]. The k th input weights, ω kn = [ω kn1 , . . . , ω knN r ] T ∈ C N r and the biases b kn of the hidden node n, for n = 1, . . . , L, are randomly initialized and fixed without tuning for the ELM training step. Specifically, the output weight vector when N s > L is written as After the ELM receiver is trained, we can use the data signal Y k = {y k [1], . . . , y k [N s ]} T at the input of the neural network, as is illustrated in Fig. 3. Therefore, a new data output weight vector O k ∈ C N s ×L is processed as Finally, the last step is to perform MIMO processing with the trained ELM output weight vector as follows: wherex k = x k [1], . . . ,x k [N s ] T denotes the detected data symbols in the output layer of the ELM network [26], [58]. Note that ω kn and b kn in (15) are fixed after training with ELM and reused in (17). The steps for performing MIMO processing with a complex ELM algorithm are summarized below. with the received data signal matrix Y k in (16) using the same input weight and bias as in Step two. 6) Perform MIMO processing with the trained ELM output using (17).

B. MIMO COMBINING BASED ON A MULTILAYER EXTREME LEARNING MACHINE
The characteristics of the data set determine the generalizability of any machine learning algorithm [55]. Therefore, the design of the features that a prominent data structure can represent is relevant. However, this task requires subject matter expertise to identify appropriate characteristics [59]. AEs can perform feature engineering and may be used to train an ANN with several hidden layers [60]. In this sense, an AE represents an unsupervised ANN where the outputs and inputs are the same for the AE by reproducing the input signal as much as possible.
Fully complex AEs are adopted as the basic building block of M-ELMs, whose training architecture is structurally separated into unsupervised hierarchical feature representation and supervised feature classification [25]. While ELM-AE is being developed to obtain multilayer sparse features of the input data for the former stage; for the latter phase, the standard ELM is performed for final decision making. Based on the supervised learning performed by the standard ELM (previous subsection), the training procedure for the unsupervised building blocks (AEs) in the M-ELM architecture is highlighted here.
A single ELM-AE can be modeled by an input layer, a hidden layer, and an output layer, as shown in Fig. 4 [60]. In this subsection, the k index for the ELM explanation is removed for simplicity.
An ELM-AE is characterized by j input layer neurons, n hidden layer neurons, j output layer neurons, and the activation function of hidden neurons a(·) [40]. For N s different samples y φ [i] ∈ C N s ×C j , i = 1, · · · , N s , the results of the hidden layer of ELM-AE and the relationship between the outputs of the hidden layer can be written as whereas the results of the hidden ELM-AE layer and the results of the output layer can be written as where w i denotes the input weight vector by joining the input layer with the hidden layer i, and b j represents the bias weight for the hidden layer j. The standard ELM presents the next modification to perform unsupervised learning: Input data is used as output data. An ELM-AE is oriented to properly enact the input features in three diverse forms: (1) compressed, which represents features from a higher-dimensional input data space to a lower-dimensional feature space, (2) sparse, which is the opposite of compressed, and (3) equal dimension where features maintain their dimensionality. The ELM-AE output weights make it possible to switch from feature space to input data. For all ELM-AE representations [25], the output weight acquires the form of with O = [o 1 , · · · , o N s ] being the hidden layer outputs from ELM-AE and Y φ the input and output data from ELM-AE.
Since O becomes the projected feature space of Y φ subject to the activation function, the output weight of ELM-AE can appropriately extract the features of the input data through singular values. The ELM-AE is stacked layer by layer according to a hierarchical structure. Before the supervised least mean square (LMS) optimization, each weight of the M-ELM hidden layer is initialized utilizing ELM-AE, which performs unsupervised layer learning by excluding random feature mapping. Mathematically, as in [60], the output of every hidden layer is written as follows 58972 VOLUME 10, 2022 where O i represents the i th hidden layer output matrix (the input layer occurs for i − 1 = 0; hence, the inputs of M-ELM are given by O i−1 ). Once the feature of the previously hidden layer is calculated, the weights and hyperparameters (the activation function and the number of hidden neurons) of the current hidden layer are fixed. AE employs the encoded results to address the original inputs by reducing the reconstruction errors. The output of the connections between the last hidden layer and the output node φ[i] ∈ C N s × C m , i = 1, · · · , N s , can be analytically determined with the resolution of a linear system such as for the standard ELM [55]. The M-ELM model is shown in Fig. 5

IV. RESULTS
In this section, we present the parameters of the propagation environment and the ELM strategies considered in this study, as well as the simulation results to demonstrate the overall performance of the new M-ELM receiver introduced in Section III-B.

A. PARAMETERS OF THE EVALUATED SCENARIO
The simulation parameters are given in Table 1, based on the NR specifications developed by the Third Generation Partnership Project (3GPP), Release 15 [1], and parameters of the phased array antenna set for 5G radios in the 28 GHz band, as described in [41], [61], [62]. We simulated the channels between the K users and the BS with the quasi-deterministic radio channel generator (QuaDRiGa), which is software coded in Matlab and developed by the Fraunhofer Institute for Telecommunications [63]. Specifically, in QuaDRiGa software, we used the mmMAGIC urban microcellular (UMi) cluster channel model [64], a frequencyselective mmWave channel as the general model described in Section II-B.
In this work, we selected 16-QAM and 64-QAM as pilot and data modulation schemes for simulation. In addition, we did not have to do much data processing before the training module with the M-ELM neural network as this strategy can handle complex data entries. Finally, we used 10 dBm of equivalent isotropically radiated power (EIRP), the gain from the product of the transmitter power, and the antenna array gain. Fig. 6 shows the fully connected hybrid simulated array composed of a single panel of 16 × 16 dual-polarized antennas. Each polarized array is connected to four RF chains. This hybrid structure consists of 512 antennas, where 256 antennas have vertical polarization, whereas the other 256 antennas have horizontal polarization [61], [62].

B. PARAMETERS OF THE ELM STRATEGIES
For ELM strategies, pilots must follow the same modulation scheme as data symbols to ensure a successful learning stage and thus maximize wireless performance [26], [35], [65]. The OFDM signals received with S subcarriers per symbol introduced in (1) are the complex numbers that we used as input to train the proposed M-ELM network. That is, with this strategy, we did not have to do extensive data processing prior to the training module. Although, we have used thousands of channel samples to simulate the stochastic behavior of a mobile scenario.
In this study, to explore the advantages and disadvantages of our proposal, we present various versions of the ELM neural network such as ELM, fully real ELM, M-ELM, and fully real M-ELM (M-ELM is strictly defined in the complex domain). In this sense, we present hyperparameters for the different ELM strategies to maximize their generalizability. We adjusted the hyperparameters of the proposed model by exhaustive search, such as the number of neurons per hidden layer, the number of hidden layers, and the activation function. We have also validated the proposed algorithm with respect to the achievable spectral efficiency parameter, as described in Section II-E.
For the approaches of the ELM and M-ELM neural networks (complex domain), we set the activation function tanh since a smaller BER can be achieved compared to other functions [26], [65]. The weights and biases between the input and hidden layers were generated following a uniform distribution in the interval [-0.005, 0.005], which consists of the activation function's region of convergence (ROC) [26]. For ELM and M-ELM in the real domain, we adopt the sigmoid activation function because it shows superiority in terms of computational cost and prediction accuracy in the context of ELM [66]. The hidden neuron parameters were randomly derived based on a uniform distribution defined from -1 to 1 [35] The ELM hyperparameters that enhance the BER are briefly exposed as 1) ELM: According to [26], we set the number of neurons in the single hidden layer as the number of antennas dedicated per user in the BS, therefore L = 64. 2) Real ELM: The number of hidden neurons and the regularization parameter, which is useful to improve the stability of the ELM, are 120 and 228, respectively. This configuration was performed by looking at the performance of the system in terms of these hyperparameters for various levels of SNR. Note that this ELM must work with strictly real information (a single constellation symbol is decomposed in its in-phase and quadrature components). Consequently, the input layer of a real ELM strategy is composed of twice the number of antennas at the BS while it has two output neurons (one for each component of the constellation symbol). More details about the real ELM as an equalizer can be found in [35], where this strategy 58974 VOLUME 10, 2022  is introduced for diminishing the laser phase noise in coherent optical OFDM systems (SISO systems).

3) M-ELM:
We set 16 neurons in each of the two hidden layers (n 1 = n 2 = 16). This adoption comes from the performance results presented in Section IV-C, as well as to reduce the computational cost of the proposed M-ELM receiver. 4) Real M-ELM: Based on an exhaustive optimization procedure, the number of neurons in the first and second hidden layers corresponds to 80 and 40, respectively. The three regularization parameters (one between each layer of the ELM) are set to 0 for the sake of simplicity of equalization. Since this ELM comes to be a modified/improved version of the real ELM [35], it also possesses 128 input neurons (a BS equipped with Nr = 64 receiving antennas) and two neurons defined in the real domain.

C. NUMERICAL RESULTS
In Fig. 7, we probed the performance of the proposed complex M-ELM receiver, varying the number of neurons in the first and second hidden layers. In this way, we tested the appropriate number of neurons to be configured in each of the hidden M-ELM layers that result in a lower BER. As shown in Fig. 7, for the proposed M-ELM strategy, when the number of neurons in the first and second layers coincides, the achievable BER is smaller. These results reveal that it is not necessary to set a high number of neurons in each of the hidden layers of the M-ELM receiver since only a few neurons are needed to decrease the BER and enhance the SE. In general, the compressed representation (from a higher dimensional input data space to a lower-dimensional feature space) is superior to the sparse representation in terms of the BER metric. Furthermore, in Fig. 7, we can see a horizontal line along the first layer of the M-ELM axis. This result reveals that with 64 neurons (L = N r ) in the first hidden layer, we can achieve a low BER. Therefore, we can achieve the same results with one hidden layer as with two hidden layers. However, with fewer neurons and more layers, the size of the matrices in each layer is smaller, so the processing task is easier.
Furthermore, in Table 2 we present the numerical results for BER achieved with different sets of neurons in the first and second layers of the proposed M-ELM strategy. These results correspond to Log 10 BER with SNRs of 4dB, 8dB, and 12 dB, illustrated in Fig. 7.
Based on the data presented in Table 2, we can see that, on average, the combination of neurons in the hidden layers of the M-ELM receiver presents the same results. Furthermore, we present the SE and BER results in Fig. 8, with pairs of 16, 20, and 24 neurons configured in the first and second layers of the complex M-ELM strategy.
As shown in Fig. 8, there is no need to define a large number of neurons in each of the hidden layers of the proposed M-ELM receiver. With few neurons, we can achieve SE and BER results similar to those of a high number of neurons. Consequently, we set 16 neurons in each of the hidden layers of the M-ELM strategy to reduce the complexity of signal processing.
Second, to test the performance of the different MIMO combining techniques presented in this study, in Fig. 9, we show the average SE achieved by the k th user on the network with a 16-QAM modulation scheme.  In Fig. 9, for the complex ELM and M-ELM strategies, the average SE is almost the same. For most of the SNR range, these ANNs outperform the real ELM and MMSE methods. For the MR receiver, the average SE is low due to the impact of non-linear distortions and the presence multiuser interference. To complement the results obtained in Fig. 9, we present the average SE for a 64-QAM modulation scheme in Fig. 10.
As expected, in Fig. 10 the SE results with 64-QAM are higher than 16-QAM for the techniques presented in this study. However, the SE results achieved for 64-QAM with MMSE are less than those achieved with 16-QAM.
In Fig. 11, we present the 16-QAM BER results achieved for the different receivers to account for transmission errors. The results in Fig. 11 reveal that both the complex ELM and the real and complex M-ELM strategies have the lowest BER. An average BER of 10 −6 can be reached at 10 dB of SNR with these ANN methods. The average BER achieved with real ELM is slightly higher than that achieved with the complex ELM and M-ELM (real and complex) techniques. However, both MMSE and MR receivers show high BER curves. These results show that ANN strategies can reduce the effects of multiuser interference with more precision than linear MIMO receivers (MMSE and MR). In Fig. 12 we present the average BER results achieved with a 64-QAM modulation scheme.
As expected, with a 64-QAM modulation scheme, the results in Fig. 12 present higher BER curves than the 16-QAM results in Fig. 11. We can see that the bit error rate 58976 VOLUME 10, 2022  with the Real ELM receiver increases, whereas the results of ELM, M-ELM, and Real-ELM are slightly the same.
Finally, in Table 3, we present the processing time, in terms of the sample mean and variance range over 2000 simulations, to perform the MIMO combination task with the proposed ANNs, as well as with the MR and MMSE receivers. However, only for MR and MMSE processing, we have also considered the processing time to perform channel estimation as is described in Section II-C, since this process is not performed with the proposed M-ELM strategy.
The results obtained in Table 3 depend on the hardware on which the simulations were performed. Taking this into account, the technical characteristics of the computer used to perform the simulations are as follows:   • Software: Matlab R2019b As in [26], we complement the results presented in Table 3 with the number of floating-point operations in analytical form. The results obtained in Table 3 were obtained following the methodology used in [26]. Table 4  Taking into account the parameters of Table 1 and based on the data presented in Table 4, when N s = 792, K = 4, N r = 64, the number of floating-point operations required for MMSE is 246774528, whereas for ELM processing, the floating-point operations are 40390528, that is, the ELM algorithm requires 16.367 % of the operations required for MMSE. Furthermore, the number of floating-point operations for the M-ELM receiver is 5102528, which represents 2.067 % of the operations required for MMSE processing or 12.633 % of the operations required for the single layer ELM neural network.

D. COMPARISON WITH A DNN APPROACH
A multilayer perceptron (MLP) is implemented and tested to compare its performance with the ELM methods in terms of BER, time complexity, and implementation. The hyperparameters of this DNN were selected using a random search approach. In this sense, we first define the maximum and minimum values for the number of layers, the number of neurons in each layer, the minibatch size for model training, and the number of training epochs. The available activation functions are tanh and ReLU . Hyperparameters were randomly tested according to a uniform distribution. Finally, the hyperparameters with the best BER are selected. VOLUME 10, 2022 The final structure of the MLP is the following: An input layer with 2N s neurons, corresponding to the real and imaginary components of the N s subcarriers in the mmWave signal. These are followed by 10 fully connected layers with 120, 100, 120, 100, 120, 100, 120, 100, 120, and 100 neurons, respectively. After each fully connected layer, there follows a batch normalization layer that helps to speed up the DNN training process and reduce sensitivity to network initialization. The activation function for each hidden layer is tanh. Finally, we set the output, a regression layer, with 2N s neurons corresponding to the real and imaginary components of the N s subcarriers. We used the stochastic gradient descent algorithm with a momentum of 0.9 and update the network parameters with a learning rate of 0.001 and a mini-batch size of 600 samples. Furthermore, we reduced the learning rate by a factor of 0.99 after each set of 10 epochs.
As we can see in Figs. 9-10, the SE results achieved with the MLP strategy for 16-QAM and 64-QAM modulations are significantly lower than the results obtained with the revised techniques in this work. Consequently, Figs. 11-12 present higher BER results for the MLP strategy with 16-QAM and 64-QAM. Although the MLP strategy produced efficient results for mmWave communications with single antenna systems, as presented in [10], the MLP neural network cannot perform MIMO processing as the M-ELM strategy and linear MIMO receivers can.
In terms of implementation, the ELMs methods present relevant advantages with respect to a DNN technique. First, a DNN requires much higher processing capabilities than ELM methods. In other words, the DNN requires adjusting the input to a vector of real numbers and a larger number of operations to predict the same output. Additionally, an ELM quickly adjusts (online training) to changes in the system. However, a DNN model requires offline training to adapt to new conditions in the system, such as a different location of the mobile user terminal.

V. ANALYSIS AND DISCUSSION
There are optimized signal processing techniques that can achieve better performance compared to MR and MMSE receivers. However, these techniques require more computational resources. This is a limitation because the signal processing task is higher for mmWave communications.
SE results with the ELM strategies are equal to or higher than those obtained with the MMSE receiver and higher than the SE results achieved with the MR combiner. Moreover, we can see that the SE results obtained with M-ELM can exceed 8 bits/s/Hz with a 64-QAM scheme, even though this modulation scheme can only transmit 6 bits/symbol. This result is due to the number of antennas in a communication system. Since mmWave systems use large arrays of antennas, SE increases with the number of antennas compared to SISO systems [52]. However, with linear MIMO receivers (MMSE, MR), the growth in spectral efficiency depends on how the multiple antenna channels are estimated, since an error in the estimation process reduces the SINR and therefore reduces the SE, as described in (12).
We can also see that the SE results obtained with M-ELM can follow the same rule of SE for linear MIMO receivers, even though there is no direct channel estimation with this technique. However, the ELM strategies presented in this study show lower BER results than the linear MIMO receivers, that is, the MR and MMSE methods. However, the ELM strategies presented in this study show smaller BER results than the linear MIMO receivers, namely, the MR and MMSE methods. This is due to the advantage that ANNs present since the multiuser interference is attenuated more precisely than with linear MIMO combining methods; moreover, the computational complexity is reduced.
As presented in Section IV-C, in figures 9 and 10, the proposed complex M-ELM receiver can achieve the same SE and BER results as the ones achieved with the complex ELM method. However, the former is a more efficient receiver, as this strategy presents less computational complexity than the latter. The reason why the multilayer strategy presents less computational complexity than the single-layer ELM is that we can split the process into smaller matrices. The complex ELM is made up of a single hidden layer with L = N r neurons, that is, 64, the same number of BS antennas dedicated per user in the cell. This means that the internal processing of this method requires a large matrix operation. However, with the multilayer strategy, we can establish two hidden layers with at least 16 neurons in each layer. In this way, internal processing requires the operation of smaller matrices.
Splitting the signal processing task into smaller matrix operations is the primary function of a multilayer strategy. Therefore, the computational complexity relaxes with the proposed complex M-ELM receiver. This is the main contribution of our investigation because the mmWave signal processing is high, and effective methods are necessary to reduce complexity.

VI. CONCLUSION
There is a trade-off between the performance of the BER/SE metrics and the execution time of the MIMO combining processing with the proposed M-ELM. With only 16 neurons in each of the two configured hidden layers of the proposed ANN, we can achieve the same SE and BER results as the standard ELM receiver. Furthermore, the proposed M-ELM strategy requires less processing time than the standard ELM receiver to perform the same MIMO processing. Moreover, the lower processing time required by the proposed M-ELM is validated by the analytical number of floating-point operations required by the proposed scheme. Therefore, the M-ELM method requires a shorter processing time compared to other state-of-the-art schemes and even a shorter processing time than traditional linear MIMO receivers. 58978 VOLUME 10, 2022