Deep Learning-Based Wireless Channel Estimation for MIMO Uncoded Space-Time Labelling Diversity

Uncoded space-time labeling diversity (USTLD) is a space-time block coded (STBC) system with labeling diversity applied to it to increase wireless link reliability without compromising the spectral efficiency. USTLD achieves higher link reliability relative to the traditional Alamouti STBC system. This work aims to design a bandwidth-efficient and blind wireless channel estimator for the USTLD system. Traditional channel estimation techniques like the least-squares (LS) and the minimum mean squared error (MMSE) methods are generally inefficient in using the channel bandwidth. The LS and MMSE channel estimation schemes require the prior knowledge of transmitted pilot symbols and/or channel statistics, together with the receiver noise variance, for channel estimation. A neural network machine learning (NN-ML) channel estimator with transmit power-sharing is proposed to facilitate blind channel estimation for the USTLD system and to minimize the required channel estimation bandwidth utilization. We mathematically model the equivalent noise power and derive the optimal transmit power fraction that minimizes the channel estimation bandwidth utilization. The blind NN-ML channel estimator with transmit power-sharing is shown to utilize 20% of the LS and MMSE wireless channel estimators' bandwidth to achieve the same bit error rate (BER) performance for the USTLD system in the case of 16-QAM and 16-PSK modulation.


I. INTRODUCTION
Uncoded Space-Time Labeling Diversity (USTLD) is a technique developed recently by [1] to increase the link reliability of space-time block coded (STBC) systems in a multiple-input multiple-output (MIMO) environment.It uses two distinct symbol constellation mapper designs to map bitstreams to symbols.The first STBC timeslot sends information symbols from a gray coded symbol constellation mapper.The second timeslot sends the same information symbols picked from the second constellation mapper designed using the labeling technique defined in [1].This scheme outperforms the traditional Alamouti STBC [2] system in terms of bit error rate (BER) performance as it has coding gain over the Alamouti system.In [3], the authors develop a genetic algorithm-based mapper labeling design technique for non-symmetric constellations since the USTLD mapper design in [1] is limited to symmetric constellations.In [4], the authors develop a generic analytical framework to evaluate the BER performance of USTLD in Rician, and Rayleigh fading wireless channels for a three transmit antenna MIMO system.In [5], the authors show that applying media-based modulation with radio frequency (RF) mirrors enhances the wireless link reliability of USTLD STBC schemes.The authors in [6] apply signal space diversity (SSD) to USTLD STBC in order to improve the error rate performance of USTLD.They show that the SSD USTLD scheme outperforms the USTLD BER performance.Trellis code aided high rate space-time labeling diversity (TC-STLD) is proposed in [7] to deliver superior spectral efficiency whilst maintaining the BER performance, relative to that of USTLD.In [8], the authors investigate USTLD in a three transmit antenna MIMO configuration as the other USTLD research has only been carried out in a two transmit antenna MIMO configuration.They develop the second and third labeling mappers using a heuristic method and observe that the three transmit antenna USTLD scheme has superior BER performance relative to the two transmit USTLD scheme presented in [1].Rectangular quadrature amplitude modulation (QAM) for USTLD is investigated in Nakagami-m fading channels in [9].A heuristic algorithm to design the optimal labeling mapper for the rectangular QAM USTLD scheme is proposed.In [10], the authors develop a high-density M-QAM labeling mapper using a heuristic algorithm for a three transmit antenna USTLD STBC scheme.They design the second and third labeling mappers using the heuristic algorithm for 256-QAM and 1024-QAM constellations since most research has developed mapper designs for lower modulation orders.
The works discussed so far for the USTLD scheme have assumed a perfect wireless channel estimate, which motivates the investigation of USTLD under an imperfect channel estimate.Recently, deep learning has been proposed to address challenges associated with wireless channel estimators.
As stated earlier, deep learning has been proposed in the field of wireless channel estimation.In [11], the authors propose deep learning in predicting the time-frequency response of a fast-fading wireless MIMO channel.They show that the proposed deep learning algorithm has a competitive mean squared error performance relative to the traditional MMSE channel estimator.In [12], the authors prove that their proposed deep learning channel estimator outperforms the traditional compressed sensing-based algorithms for massive MIMO wireless channel estimation.The authors in [13] propose a deep learning channel estimation algorithm for doubly selective wireless fading channels.Deep learning is applied in [14] to estimate the uplink wireless channels for massive MIMO systems at the base station with some antennas using high-resolution analog-to-digital converters (ADC) and others using low-resolution ADCs.The proposed deep learning algorithm uses the high-resolution ADCs to predict the channels of the antennas using low-resolution ADCs.In [15], a deep learning-based channel estimation technique is proposed for wireless energy transfer.Based on the energy received by the energy receiver, the energy transmitter channel state information (CSI) is learned using the proposed deep learning autoencoder.The authors in [16] propose a learned denoising-based approximate message passing (LDAMP) channel estimator for beamspace millimeter-wave massive MIMO channels with limited RF chains.The deep learning-based LDAMP algorithm outperforms the compressed sensing-based algorithms.In [17], a deep learning-based channel estimator is proposed for a time-varying Rayleigh fading channel.Its mean squared error performance is shown to outperform that of the traditional channel estimation algorithms.A deep learning algorithm is proposed in [18] to handle the end-to-end wireless orthogonal frequency division multiplexing (OFDM) channels.It implicitly estimates the CSI and directly decodes the transmitted symbols.It shows robustness relative to the conventional channel estimation techniques when fewer training pilot symbols are used.In [19], deep learning-based channel estimation and equalization scheme (DL-CE) for filter bank multicarrier (FBMC) modulation is proposed.It is shown in [19] that this DL-CE scheme achieves state-of-the-art performances in channel estimation and equalization.In [20], the authors propose a deep learning-based downlink channel estimator for fast time-varying and non-stationary wireless fading channels present in high-speed mobile scenarios.The proposed deep learning channel estimator proves to have better performance relative to the traditional channel estimators whilst offering lower computational complexity.
Research on bandwidth efficient channel estimation has been performed in literature largely in MIMO-OFDM systems.In the works presented in [21][22][23][24], the authors develop bandwidth-efficient channel estimators for the MIMO-OFDM environment.In [25], the authors propose a bandwidth-efficient channel estimator for a single carrier MIMO system with frequency domain equalization.The channel estimator iteratively uses a series of fast-Fourier transforms (FFT) and inverse FFT operations to reconstruct the CSI fully.A bandwidth-efficient blind channel estimator is proposed in [26] for a full-duplex (FD) point-to-point wireless communication system.The blind channel estimator simultaneously estimates the channel parameters of the FD system without requiring time division duplex (TDD).
In summary, it is evident from the literature that LS [27] and MMSE [28] require prior knowledge of the transmitted pilot symbols and/or the wireless channel statistics to perform channel estimation.The other general observation is that the channel estimator's mean squared error (MSE) drops as the number of transmitted pilot symbols is increased.It will be challenging to perform channel estimation using the traditional LS and MMSE channel estimators in environments where the transmitted pilot symbols and channel wireless statistics are unknown.With the high cost of licensed wireless channel bandwidth, service providers are pressured to utilize bandwidth efficiently.Hence, large numbers of pilot symbols sent over a wireless channel for channel estimation may not be desirable.Therefore, we propose a blind NN-ML channel estimator with transmit power-sharing that minimizes the channel bandwidth usage whilst delivering a competitive MSE and BER performance compared to the traditional LS and MMSE schemes.We choose NN-ML because it does not require prior knowledge of the transmitted pilot symbols, wireless channel statistics, and receiver noise variance to perform channel estimation.
The idea of transmit power-sharing is taken from [29], where an optimal power fraction is derived that facilitates the optimal sharing of transmit power between the information symbols and a single reference symbol to improve BER performance.In our case, we apply this power-sharing technique between information symbols and multiple reference/pilot symbols to improve the MSE and BER performance of the NN-ML channel estimator relative to the traditional LS and MMSE based channel estimators.This translates to lower usage of channel bandwidth in order to deliver the same BER performance.
The main contributions of this paper are as follows: • We propose a novel deep learning-based bandwidthefficient blind channel estimator for the USTLD MIMO system by employing optimal transmit powersharing between information symbols and pilot symbols.To our knowledge, no literature has developed a bandwidth-efficient channel estimator using transmit power-sharing between pilot symbols and information symbols.Over and above that, the literature in [21][22][23][24][25][26] develops bandwidth-efficient MIMO channel estimators for very different system models to ours.The differences in environmental context or system model affect the method of channel estimation bandwidth optimization.We, therefore, cannot, for example, directly use a MIMO-OFDM optimized channel estimator in our system model.
• We mathematically derive a multiple pilot/reference symbol equivalent noise power upper bound for USTLD MIMO, unlike in [29], where the equivalent noise power is only for a single reference symbol.No work in the literature has derived the equivalent noise power for the USTLD MIMO system.• We apply differential calculus to determine the optimal power fraction that minimizes the equivalent noise power.The minimized equivalent noise power is shown to minimize the MSE and BER of the USTLD MIMO system.This minimization of the MSE and BER implies a minimization of channel estimation bandwidth utilization to achieve the same BER performance as the traditional channel estimation methods.The remainder of the paper is organized as follows: In Section II, we present the system model for the proposed blind NN-ML channel estimator with transmit power-sharing for USTLD MIMO and the background theory of LS and MMSE channel estimation.In Section III, we introduce the proposed blind NN-ML channel estimator's theory with transmit powersharing.Section III also presents the equivalent noise power upper bound's derivation and the optimal transmit power fraction.Section IV discusses the MSE and BER simulation results, and Section V concludes the paper.Notation: Bold lowercase (a) and uppercase letters (A) denote vectors and matrices, respectively.(. )  and ‖. ‖  are the Hermitian and the Frobenius norm of a vector or matrix, respectively.(.) is a trace function which takes the sum total of the major diagonal of a matrix.The symbol ∀  means for all values of x.The operator (. ) is the statistical expectation or mean of a random variable.The functions (.) and (.) return the real and imaginary components of a complex number, respectively.

A. SYSTEM MODEL
A 2 ×   USTLD system is used to evaluate the channel estimation algorithms' BER performance, where   is the number of receive antennas, and 2 represents the number of transmit antennas.The USTLD system is a modification of the conventional 2 ×   Alamouti system [2].The fundamental idea is to transmit a mapped symbol pair in the second time slot instead of the complex conjugates.The USTLD system generates the 2 × 2 STBC codeword matrix based on two mappers;  1  and  2  as in [1].For example, the two mappers for 16-QAM signal constellations are the Gray-coded labeling map  1 16 and the optimized labeling map  2 16 as per [1].The labeling maps and their design criterion are detailed in [1].
A bitstream consisting of 2 log 2  random bits, where  is the W-QAM/W-PSK modulation order,  = [  1 ,  𝑡  The transmission of the 2 × 2 STBC symbols happens over a quasi-static fading wireless channel with a constant channel gain over one message frame, including  = 200 W-QAM/W-PSK information symbols and N channel estimation pilot symbols transmitted per frame per transmit antenna.The pilot symbols are generated using the Zadoff-Chu sequence [30] since it can generate orthogonal complex sequences of constant amplitude and varying phase.This is important to avoid creating a singular square matrix      since the LS, and MMSE channel estimation methods rely on matrix inversion.The Zadoff-Chu sequence pilot symbols are generated using Equation ( 1) where  ∈ [0:  − 1], () ∈ ℂ is the complex pilot symbol at position n of the N-dimensional pilot symbol vector, N is the number of pilot symbols transmitted per pilot symbol vector, j is a complex number,  ∈ ℕ and Q is a relative prime number to N and obeys the equation where  is the greatest common divisor function.
The wireless channel is Rayleigh frequency-flat fading.The received pilot symbols and information/message symbols at the receiver are mathematically modeled as per Equations ( 3) and ( 4): =  +1   +   (4) where  +1 =   =  ∈ ℂ   ×  is the constant wireless channel matrix over one transmission frame because the wireless channel is quasi-static fading.The channel matrix H has complex channel gains, which are independent and identically distributed (i.i.d) according to ℂ(0,1).  ∈ ℂ   × is the transmitted pilot symbol matrix,   ∈ ℂ   ×2 is the transmitted information symbol matrix over   = 2, transmit antennas, and two timeslots.The information symbol matrix   has transmitted W-QAM/W-PSK symbols.  ∈ ℂ   × is the received/observed pilot symbol matrix over   receive antennas,   ∈ ℂ   ×2 is the received information symbol matrix over   receiver antennas and two timeslots for USTLD.
The additive white Gaussian noise (AWGN) matrix   ∈ ℂ   × is observed at the wireless receiver over the received Nr×N pilot symbols.The AWGN matrix   ∈ ℂ   ×2 is observed at the wireless receiver when receiving the information symbols over two timeslots.The reference/pilot noise matrix   and the information noise matrix   have i.i.d entries that follow the complex Gaussian distribution as follows: where    is the information noise matrix z th row and x th column entry,   2 is the average noise power for the information receiver white noise,    is the reference/pilot noise matrix w th row and y th column entry with an average noise power of   2 .
Fig. 1 Shows the USTLD system with blind NN-ML channel estimator with transmit power-sharing.
As shown in Fig. 1, the proposed system takes a fraction of the transmit power from the information symbol transmission from the wireless transmitter side and donates this transmit power fraction to the reference/pilot symbol transmission [29].Knowing that we have M information symbols transmitted per frame per transmit antenna means we donate M transmit power to N pilot symbols.This implies that each pilot symbol gets    extra transmit power, and each information symbol loses  transmit power.Mathematically this is denoted as follows: where  is the transmit power fraction, M is the number of information symbols transmitted per frame per transmit antenna, N is the number of pilot symbols sent per frame per transmit antenna, and  is the average received signal-tonoise ratio (SNR) per receive antenna.The total power for the transmission of M+N information and pilot symbols must be constant.Equations (6.1) and ( 6.2) obey this conservation of transmit power constraint.The optimal power fraction that ensures the optimal BER and MSE performance is computed using the   = (  , , ,   ) function that needs to be derived.
On the wireless receiver side, the NN-ML channel estimator is fed the received pilot symbol matrix   .The NN-ML channel estimator then predicts the wireless channel and feeds it into the maximum likelihood (ML) detector and this channel estimate is done once per received frame.The ML detector then uses the channel estimate  ̂ to detect the transmitted symbols based on the received   symbol matrix.

B. BACKGROUND OF TRADITIONAL CHANNEL ESTIMATION METHODS
The LS [27] channel estimation method is the least complex channel estimation method relative to the MMSE [28] method and the approximate linear minimum mean square error (ALMMSE) [31] method but is generally the least performing of the channel estimation methods.The LS method works by generating a closed-form channel estimation formula, which estimates a wireless channel that minimizes the square of the Euclidean distance between the observed/received pilot symbol matrix and the product of the estimated wireless channel and the transmitted pilot symbol matrix.
The formula for estimating the wireless channel based on the observed/received pilot vectors and the transmitted known pilot symbol matrix is as shown [27]  ̂ =      (     ) −1 (7) where  ̂ is the LS estimated wireless channel matrix,   is the observed/received pilot symbol matrix, and   is the transmitted pilot symbol matrix.From Equation ( 7) it is clear that      must be invertible and hence non-singular in nature, which motivates the selection of orthogonal pilot symbol vectors as entries in the transmitted pilot symbol matrix.Equation ( 7) also shows that the LS channel estimator requires the full knowledge of the transmitted pilot symbol matrix.The MMSE channel estimation method works by estimating the wireless channel using Equation ( 8) [28]  ̂ = (  2   −1 +      ) −1      (8) where   2 is the receiver noise variance,   = (     ) is the wireless channel autocorrelation matrix,   =    is the MMSE pilot symbol matrix,   =     +   is the observed/received MMSE pilot symbol matrix where   =   .As can be seen from Equation ( 8), the MMSE channel estimator requires the full knowledge of the pilot symbol matrix, wireless channel autocorrelation statistics, and the noise variance at the receiver side.These are assumed to be known without any estimation errors.

III. PROPOSED CHANNEL ESTIMATION METHOD FOR MIMO USTLD
The channel estimation method proposed here facilitates blind channel estimation when the transmitted pilot symbols, wireless channel second-order statistics, and the noise variance are unknown at the receiver side.The NN-ML channel estimator with transmit power-sharing method is a blind machine learning channel estimator.It also reduces the required bandwidth to achieve a good MSE and BER performance relative to the traditional channel estimation methods.
Recently, researchers within the communications research space have taken a keen interest in applying machine learning to solve communications-related research problems.The problems are primarily related to wireless symbol detectors' design using machine learning.The order of computational complexity of current expert wireless receivers is high for higher-order modulation W-QAM/W-PSK.Machine learning comes with the benefit of training a mathematical function to predict an output based on a noisy input.Once trained, there is no need for the machine learning algorithm to search iteratively, in a large search space for higher-order modulation orders, for an estimated transmitted symbol in the case of a wireless receiver symbol detector.The function will, in a much shorter convergence time, with similar BER performance, estimate the transmitted symbol compared to a ML detector that takes a longer time to converge to a solution.This is critical for real-time communication environments as link latency needs to be minimal to achieve a good quality of service (QoS).
In the case of channel estimation, we evaluate if we can train a machine-learning algorithm to predict the wireless channel based on a noisy received pilot symbol matrix with similar or better MSE and BER performance relative to the expert method of using LS or MMSE to carry out channel estimation.The advantage of using machine learning over LS or MMSE is that we can blindly estimate the wireless channel without knowing the pilot symbol matrix, channel autocorrelation matrix, and receiver noise variance.
The work in this section is organized as follows: Section III-A concentrates on Hyperparameter tuning of the neural network model, Section III-B ventilates the training phase of the neural network model and Section III-C exhibits the derivation of the optimal transmit power fraction that minimizes the channel estimation bandwidth utilization.

A. PROPOSED NN-ML CHANNEL ESTIMATION HYPERPARAMETER TUNING
For a supervised machine learning algorithm to be useful, it needs to be first trained using appropriate data.Two machine learning algorithms are being trained for, the first being the NN-ML-Channel-Estimation without power-sharing where  = 0.The second is the NN-ML-Channel-Estimation with power-sharing, where  =   .We then generated 10 000 samples for the received pilot symbols based on Equation ( 3) for  = 0 and  =   .We also used the Zadoff-Chu sequences in Equation (1) to generate the transmitted pilot symbol matrix over a range of N pilot symbols.The 10 000 samples were generated for each SNR value in the range [0dB to 16dB, step 2dB].Since this section is responsible for hyperparameter tuning, we only tuned the hyperparameters using the 10dB SNR samples in order to reduce the tuning time.Table 1 shows the machine learning architecture and the training hyperparameters found using a genetic algorithm for the cases when  = 0 and  =   .The hyperparameters found through genetic algorithm [32] optimization are the pseudorandom number generator seed value, learning rate and training batch size.The objective function which the genetic algorithm optimized was the validation MSE at 10dB SNR.
The NN-ML-Channel-Estimation architecture in Table 1 was invariant to the changes in the number of pilot symbols .However, the architecture was found to be sensitive to the MIMO receive antenna configuration   .This is because changing the value of   explicitly alters the MIMO channel matrix dimensions, which also alters the number of required neurons at the output layer of the architecture.Therefore, we declare that the architecture in The input data dimension is based on the number of transmitted pilot symbols (N) per symbol vector and the number of receive antennas   .At the bottom of Table 1 are the tuneable hyperparameter values, which we derived via a genetic algorithm.The genetic algorithm pseudo-code is as follows: Algorithm 1: Genetic Algorithm Result: Gene sequence with pseudo-random seed-value, batch size and learning rate for the population member that produces the lowest validation MSE at 10dB SNR.Step 4: End The initialization step initializes the salient variables defined in Table 2, amongst other variables in Algorithm 1.A list that stores the fitness value for each Ω  .

𝜌[1: 𝑃]
A list that stores the selection probabilities for each Ω  .

𝜐[1: 2]
A list that stores the two parents that are selected for mating c The child produced from the mating of the two selected parents. =1 [32].

𝜌[𝑘]
is the selection probability of the  ℎ population member.The selection probability is the likelihood of a population member being randomly selected to mate and produce a child for future generations.Only two parents are randomly selected to produce one child.
When the two parents are selected randomly, a genetic crossover is performed to produce a child.The crossover is done by inheriting the pseudo-random seed value from the first parent and then inheriting the batch size and the learning rate from the second parent.Random genetic mutation may occur with a probability of 0.35 for local mutation and 0.15 for global mutation.Local mutation involves altering a single inherited hyperparameter value of the child.Global mutation involves the altering of all three inherited hyperparameter values of the child.The genetic mutation involves randomly assigning values to the hyperparameters subject to the constraints stated in Step 1 of Algorithm 1.
The child produced from the selected parents' mating is then added as a new population member for the next generation.
Selecting mating parents and producing children is repeated until the new population size is .Only then do we destroy the old population and move on to the next generation of population members.
The process of searching for a global minimum validation MSE is repeated until the number of iterations is equal to the maximum number of generations .Only then do we search for the population member Ω  , together with its hyperparameters, that produces the lowest validation MSE evaluated from generation 1 to generation .The optimal population member's hyperparameters are then used to fully train the neural network in Table 1 from 16dB to 0dB SNR.

B. TRAINING THE NN-ML CHANNEL ESTIMATOR
There are two sets of data used in the machine learning algorithm training and testing in Table 1.The first dataset Å, is the training dataset, which is made up of 80% of the 10 000 samples generated using the received pilot symbol matrix data from Equation (3), and the second dataset ℳ is the test dataset, which is the remaining unseen 20% of the 10 000 samples.The datasets are collected for SNR values from 0dB to 16dB.The datasets are generated for both scenarios when  = 0 and  =   .We trained the machine learning algorithm over a wide range of SNR values 0dB to 16dB.The algorithm was trained from 16dB down to 0dB as it was noticed that training it the other way from 0dB to 16dB yielded a poor validation mean squared error per SNR training cycle.
For the training to happen, the received complex pilot symbol matrix from Equation ( 3) is converted into a 2-dimensional data structure, with real numbers, with a single channel.A matrix representation that converts a complex matrix into a 2-dimensional real-valued data structure is used as per [33].The resulting 2-dimensional data structure is stored in a single element array to create a single channel.Since the received pilot symbol matrix is ℂ   × , the real-valued matrix is ℝ 2  ×2 as shown in Equation ( 9) [33]   = ( (  ) −(  ) (  ) (  ) ) ∈ ℝ 2  ×2 (9) where in Equation ( 9): The set of 10 000 samples is made up of the 2-dimensional data   with a single channel.To train the machine learning algorithm, using supervised learning, we need the output label data that corresponds to this input training set Å.The output training data is the actual wireless channel matrix  ∈ ℂ   ×  as per the method used in [11].We then convert this wireless channel matrix into a real-valued vector  ∈ ℝ 1×2    , which in our case we have   = 4 and   = 2, which means  ∈ ℝ 1×16 , which makes the output layer of the neural network a 16 neuron layer as per Table 1.The realvalued entries of the vector  are determined by taking the real and imaginary values of the complex entries of the channel matrix .
During training, the set Å is fed into the machine learning algorithm function (Å, ) with hyper-parameters ().This function's output is compared to the output labeled data in vector  that corresponds to the actual wireless channel as  (10) where ‖Å‖ is the batch size of the training set.This loss function is used in back-propagation by the Adam optimizer [34] to determine the neural network function (Å, ) hyper-parameters (), or synaptic weights.During training, the training process looks for the weights that produce the lowest validation MSE, which is determined by evaluating the MSE loss function in Equation ( 10) after each training epoch using the unseen test data set ℳ.

C. OPTIMAL TRANSMIT POWER SHARING
Based on Section II's system model, we derive the optimal power fraction function   = (  , , ,   ).This subsection is dedicated to deriving the optimal power fraction function and finding the optimal number of pilot symbols to be transmitted for optimal MSE and BER performance.
Inspired by a generalized differential scheme for spatial modulation systems [29], the following derivation is performed to determine the optimal power fraction that minimizes the NN-ML wireless channel estimator's MSE and BER performance.Based on Equations ( 3) and ( 4), we can see that the channel matrix H is the common link between Equations ( 3) and ( 4).Manipulating Equation (3), we get the following Equation ( 11): We can see from Equation (12) that the first coefficient of   is      (     ) −1 and is identical to Equation (7), which is the least-squares wireless channel estimate.We can then replace the first coefficient of   with the generic placeholder for the wireless channel estimate, which we will call  ̂ .This changes Equation ( 12) to be represented mathematically as   =  ̂   −      (     ) −1   +   .
To find the optimal power fraction that minimizes the channel estimate MSE and BER, we need to derive the equivalent noise power based on a similar method used to derive the equivalent noise power for a generalized differential scheme for spatial modulation systems [29].From Equation (12) it is clear that the equivalent noise power is actually dependent on the noise term −     (     ) −1   +   , for the MIMO pilot assisted wireless channel estimation methods.If  ̂ is the estimated wireless channel at an instant, then the average signal mean squared error expression,   ̂  (‖  −  ̂   ‖  2 ), needs to be minimized for a good channel estimation accuracy where the operator E(•) is the statistical expectation given that   is known at the receiver.In order to practically evaluate the accuracy or MSE of the wireless channel estimate, we need to transmit a fixed, known   information symbol matrix based on the USTLD method and observe the received   symbol matrix and evaluate  (‖  −  ̂   ‖  2 ) after having have estimated the wireless channel and obtained  ̂ .Expanding the minimization of the term  (‖  −  ̂   ‖  2 ), we get the following mathematical expression: Using the Frobenius norm property in Equation (13.1) and the Cauchy-Bunyakovsky-Schwarz inequality in Equation (13.2) [35] ‖‖  ≤ ‖‖  ‖‖  (13.1) |tr(  )| ≤ ‖‖  ‖‖  (13.2) And using the Frobenius norm triangle inequality in [35], we get , (‖  ‖  2 ) = 2  and (‖  ‖  2 ) = 2    2 therefore we can simplify the expression to form Equation ( 14) where   is the number of transmit antennas and   is the number of receive antennas in the wireless MIMO configuration.We can see from Equation ( 14 But we know that It is easy to see that when the expression (‖  ‖  2 ) + (‖     (     ) −1   ‖  2 ) increases in value, the expression (2‖  ‖  ‖     (     ) −1   ‖  ) also increases in value, and the converse is true, which implies that the two expressions are in phase.This means that the mathematical expression, (2‖  ‖  ‖     (     ) −1   ‖  ), only shifts or translates the graph of (‖  ‖  2 ) + (‖     (     ) −1   ‖  2 ) vertically downwards on the Cartesian plane.It does not affect the  value where the minimum or stationary point occurs.Thus, to minimize the term (‖−     (     ) −1   +   ‖  2 ), we need to only concentrate on minimizing the term (‖  ‖  2 ) + (‖     (     ) −1   ‖  2 ).
However, knowing that the total equivalent noise power, henceforth the equivalent noise power, is the total noise power contribution from the noise matrix expressions      (     ) −1   and   .We can define the equivalent noise power as follows in Equation ( 16): As can be seen, that Equation (15) contains the equivalent noise power; thus, for us to minimize the channel estimation MSE, we can find the minimum or lowest equivalent noise power.So mathematically, this is noted as follows in Equation (17): To get the optimal power fraction that minimizes the equivalent noise power, we rely on the calculus of finding the stationary point of the equivalent noise power with respect to the transmit power fraction.
2  = 0, solving this leads to the following optimal power fraction as shown in Equation ( 21) where   is the optimal transmit power fraction, M is the number of information symbols transmitted per transmit antenna, N is the number of pilot symbols sent per transmit antenna,   is the number of transmit antennas in the MIMO configuration and ‖‖  is the Frobenius norm of the matrix  =    (     ) −1 .
The next objective is to find the optimal number of pilot symbols that must be transmitted over the wireless channel and used for channel estimation.Since it follows that getting a minimal or lowest equivalent noise power translates to minimum channel estimation MSE, we need to select the number of pilot symbols that produce the lowest possible equivalent noise power.The optimal power fraction that minimizes the equivalent noise power is a function of the number of pilot symbols N. Thus, the critical parameter to select for optimal channel estimation performance is the number of pilot symbols N since the optimal power fraction can be obtained from Fig. 3 after finding the optimal N. We can find the optimal number of pilot symbols N from Fig. 2. As can be seen in Fig. 2, the realistic values for the number of pilot symbols can only be in the range 2 to 200 since the quasi-static channel fading is constant for slightly more than 200 symbols at a time, hence the limit of 200.However, we cannot use  = 1 pilot symbol because the Zadoff-Chu sequence always starts with an element with an amplitude of 1 and a phase of 0 °.This means that for  = 1, we have a singular square matrix created by      , which is not invertible.We, therefore, can only work with values of N in the range 2 to 200.This is our search space for our optimal number of pilot symbols.The SNR is set to 0dB because it plays an insignificant role in determining the optimal number of pilot symbols since the SNR is just a scaling factor.Setting the SNR to 0dB is equivalent to setting a scaling factor of 1 on the linear scale.Fig. 2 exhibits the fact that the lowest equivalent noise power value is observed at  = 2 pilot symbols.As shown in Section III-C's derivation, the lowest equivalent noise power value corresponds to the lowest MSE performance.This translates to an optimal power fraction   ≈ 0.16 based on extrapolation from Fig. 3 at  = 2. Fig. 3 is produced from Equation (21) and Fig. 2 is produced from Equation (20) with the transmit power fraction set as  =   .From Fig. 3, we can see that the optimal power fraction stays within the feasible range of [0,1) over a different number of pilot symbol values.The other observation is that the power fraction cannot increase and approach unity linearly because the transmit power of the information symbols will become negligible, negatively affecting the BER performance of the transmitted information symbols.Thus whilst increasing the transmit power of the pilot symbols may yield an excellent channel estimation MSE performance, lowering the transmit power of the information symbols close to 0 will yield inferior BER performances, which will defeat the aim of improving the channel estimation accuracy, thus a perfect balance must be struck to optimize the BER and MSE performance.

IV. SIMULATION RESULTS
The Monte-Carlo wireless simulation environment is setup as a 2 × 4 multiple-input-multiple-output (MIMO) wireless channel with Rayleigh quasi-static fading in which the channel gain remains constant for 200 +  symbol durations and changes every 200 +  symbol durations.The wireless transmit and receive antennas are sufficiently spaced enough such that the wireless channels are de-correlated.The number of information symbols transmitted per frame is  = 200 and the number of pilot symbols transmitted per frame is  = 2 or  = 10.The information and pilot symbols share a fraction of the transmit power based on the optimal power fraction in Equation ( 21) for  = 2.The pilot symbols are generated using the Zadoff-Chu sequence in Equation (1) and based on the number of pilot symbols  = 2 or  = 10.The information symbol modulation order used in the simulation is 16-QAM and 16-PSK.The average power constraint for the 16-QAM and 16-PSK symbols is set to 1.
The NN-ML-based algorithm model architecture seen in Table 1 is loaded into the simulation environment, and the saved optimal synaptic weights are loaded into the machine learning model.The channel estimation algorithms used are the LS, MMSE, NN-ML without power-sharing, and NN-ML with power-sharing.The Monte-Carlo simulation determines the channel estimation algorithms' MSE and BER performance over 0dB to 14dB SNR.From Fig. 4 we can see that the NN-ML channel estimator without transmit power-sharing underperforms the traditional channel estimation methods LS and MMSE at high SNR.The reason for this is that the hyperparameters in Table 1 are searched for in a multivariable landscape with the objective to find the global minimum validation MSE at 10dB SNR.Because the landscape is multivariable, it lends itself to the possibility of having multiple local stationary points and a single global stationary point.The genetic algorithm (GA) used in Algorithm 1 tries to search for the global stationary point in the multivariable landscape with no guarantees of finding the global stationary point.Our GA runs for only 100 evolutionary generations, with a population size of 10, and this limits the number of permutations of hyperparameter values tested on the neural network.Thus, it restricts the search space for the global stationary point.The hyperparameters found in Table 1 for the case  = 0, are clearly suboptimal based on the poor MSE performance at high SNR.The higher SNR range is sensitive to the channel estimator's systematic errors as opposed to the low SNR range.
We also see from Fig. 4 that the NN-ML channel estimation method has an improved MSE performance when transmit power-sharing is used to improve its performance.From Fig. 4, the added transmit power fraction for the pilot symbols improves the MSE performance.It is also interesting to note that for the defined SNR range, the NN-ML channel estimator with transmit power-sharing outperforms the traditional LS and MMSE methods.At MSE= 6 × 10 −2 , we have an approximately 12dB gain over the traditional LS and MMSE channel estimation when using the NN-ML with power-sharing channel estimation algorithm.This implies that we can save transmit power with the NN-ML with the power-sharing method; we can also do blind channel estimation without knowledge of the transmitted pilot symbols and/or channel autocorrelation statistics or the received noise variance required by the traditional channel estimation methods.As shown in Fig. 5, the NN-ML channel estimator's BER performance with transmit power-sharing is the best performing as expected from the MSE accuracy shown in Fig. 4.This shows that minimizing the equivalent noise power does, in fact, minimize the channel estimation MSE performance as shown in Fig. 4 and the signal MSE performance as shown in Fig. 5.The signal MSE performance is linked to the maximum likelihood detector performance.Hence, the NN-ML channel estimator's BER performance with transmit power-sharing has the best BER performance since its equivalent noise power is minimized relative to the other channel estimation algorithms.
There is a loss in diversity at high SNR for the NN-ML channel estimator without transmit power-sharing as the hyperparameters selected for the case  = 0 are suboptimal.Fig. 4 MSE performance shows that at high SNR the NN-ML channel estimator without transmit power-sharing has poor performance relative to the other channel estimators.This poor MSE performance has an impact on the BER performance, as shown in Fig. 5.For the NN-ML channel estimator with transmit power-sharing, there is no loss in diversity because the hyperparameters selected for the case when  =   are near-optimal.The approach of selecting hyperparameters using a GA does not guarantee that the stationary points found are globally optimal.
We also observe a near 2dB BER performance gain between the NN-ML channel estimator with transmit power-sharing and the traditional LS and MMSE channel estimation methods.This implies that the NN-ML channel estimator with transmit power-sharing enables a good channel estimation accuracy and link reliability relative to the traditional channel estimation algorithms whilst using a minimal number of pilot symbols to estimate the wireless channel.As shown in Fig. 6, the NN-ML channel estimator with transmit powersharing has the same BER performance, at  = 2, with the traditional LS and MMSE channel estimation methods at  = 10.This means that we need 8 extra pilot symbols for the LS and MMSE channel estimation methods to deliver a similar BER performance as the NN-ML channel estimator with transmit power-sharing.That is a waste of expensive wireless channel bandwidth, which should be used to transmit information symbols.
The other observation is that the NN-ML channel estimator without transmit power-sharing, at higher N values, as observed in Fig. 6, has a BER performance that approaches that of the traditional LS and MMSE methods.This is advantageous because totally blind channel estimation can be achieved by this NN-ML method as it only needs the observed/received pilot symbols matrix to perform channel estimation.This means that it does not need to know the wireless channel second-order statistics, nor does it need to know the noise variance.Neither does it need to know the transmitted pilot symbol matrix, unlike the traditional channel estimation methods that require this prior knowledge.In Fig. 7, we see that the same performance gains achieved in Fig. 6 for 16-QAM modulation apply to 16-PSK modulation.This implies that 16-QAM or 16-PSK USTLD modulation in conjunction with NN-ML channel estimator with transmit power-sharing can achieve a comparable BER performance to the traditional channel estimation methods but at 20% of the bandwidth required by the traditional channel estimators LS and MMSE.

V. CONCLUSION
The power-sharing method improves the NN-ML channel estimation MSE accuracy relative to the NN-ML method without transmit power-sharing.The MSE performance of the NN-ML channel estimator algorithm with transmit power-sharing is very good throughout the SNR range relative to the traditional LS and MMSE channel estimation methods.The NN-ML channel estimator with transmit power-sharing MSE performance, at MSE= 6 × 10 −2 , has an approximately 12dB gain over the traditional LS and MMSE channel estimation methods.The NN-ML algorithm does not require knowledge of the channel autocorrelation statistics and the noise variance to estimate the wireless channel.This implies that the NN-ML algorithm with powersharing can be used for wireless channel estimation where the transmitted pilot symbols, channel second-order statistics and receiver noise variance are unknown.Another inference from the results is that, since the optimal number of pilot symbols is only 2, it means we can achieve high channel estimation MSE/BER accuracy whilst saving expensive channel bandwidth since the traditional channel estimation algorithms will need a higher number of pilot symbols to achieve similar MSE/BER performance.From the simulation results, the proposed NN-ML channel estimator with transmit power-sharing requires only 20% of the bandwidth utilized by LS and MMSE to achieve the same BER performance for 16-QAM and 16-PSK USTLD modulation.

Fig. 2 .
Fig. 2. Shows the equivalent noise power at 0dB SNR versus number of pilot symbols .

Fig. 3 .
Fig. 3. Shows the power fraction versus number of pilot symbols .

Fig. 4 .
Fig. 4. Shows the MSE performance of the 2 × 4 MIMO NN-ML channel estimation vs traditional channel estimation methods.

Fig. 5 .
Fig. 5. Shows the BER performance of the 2 × 4 MIMO NN-ML channel estimation vs traditional channel estimation methods at the same number of pilot symbols for 16-QAM modulation.

Fig. 6 .
Fig. 6.Shows the BER performance of the 2 × 4 MIMO NN-ML channel estimation vs traditional channel estimation methods at different number of pilot symbols for 16-QAM modulation.

Fig. 7 .
Fig. 7. Shows the BER performance of the 2 × 4 MIMO NN-ML channel estimation vs traditional channel estimation methods at different number of pilot symbols for 16-PSK modulation.

Table 2 : Definition of Genetic Algorithm Parameters
χ DatasetAlgorithm 1 works by generating  random population members with randomized seed-values, batch sizes and learning rates as an initial Genetic algorithm population of possible hyperparameters.The population members, Ω , hyperparameters are randomly assigned values subject to the constraints stated in Step 1 of Algorithm 1.The algorithm then iterates through each population member per generation.It uses each population member's hyperparameters to train the neural network architecture in Table1, represented as Φ(•) in the Algorithm.The neural network architecture is trained with a training dataset of 80% of the imported dataset stored in χ.The other 20% is used as a test dataset to produce the validation MSE at 10dB SNR.The dataset stored in χ is produced from Equation (3), at 10dB SNR, for the power fraction values  = 0 and  =   .The training dataset is collected at a single SNR value of 10dB because at this stage, we are merely pre-training the neural network to select hyperparameters that produce the lowest validation MSE at 10dB SNR.At this stage, we are not fully training the neural network for channel estimation but for tuning or selecting optimal hyperparameters.
10 (1000[]−45).ω[k] is the  ℎ population member's validation MSE at 10dB SNR.[] is the  ℎ population member's fitness value.The formula for computing the fitness values is found empirically by maximizing the small differences between validation MSE values at 10dB SNR.If we do not nonlinearly amplify the small differences between MSE values, then the selection probabilities will be almost identical for all population members since their fitness values will be very close to each other.For example, if 5 population members A, B, C, D, and F have validation MSE values of 0.04797, 0.0478, 0.048, 0.04632, and 0.04921, respectively, then it is obvious to see that the selection probabilities for population members A, B, C, D, and F will be almost identical if the fitness function does not non-linearly amplify the validation MSE values.However, if we use our empirical formula to calculate the fitness values, we get the following fitness values for population members A, B, C, D, and F: 1071.51,1584.89,1000, 47863.01,and 61.66, respectively.The population member F has the worst validation MSE and the lowest fitness value of 61.66.The population member D has the best validation MSE and the highest fitness value of 47863.01.The difference between the validation MSE for population members F and D is only 0.0029.However, the fitness values have a huge difference to avoid equal selection probabilities being calculated in scenarios when the validation MSE values are very close to each other.After the fitness values are computed, the selection probability per population member is calculated based on each population member's fitness value.The selection probabilities are computed based on the formula [] = [] ∑ [] (‖  −  ̂   ‖  2 ) = (‖−     (     ) −1   +   ‖  2 ).Therefore, we can minimize the average signal mean squared error  (‖  −  ̂   ‖  2 ) expression by minimizing its equivalent expres- ) that the minimization of  (‖  −  ̂   ‖  2 ) implies minimizing the channel estimate MSE  (‖ −  ̂ ‖  2 ) term.Using this fact, we can then link the channel estimation MSE minimization to the minimization of the equivalent noise power term.Based on Equation(12), (19)let  =    (     ) −1 where  ∈ ℂ ×  matrix.The matrix  has constant complex entries containing the fixed transmitted pilot symbol matrix   .Therefore, the equivalent noise power becomes Equation(18)since matrices   ,  and   are independent.But (‖  ‖  2 ) =     2 , (‖‖  2 ) = ‖‖  2 , and (‖  ‖  2 ) = 2  thus the equivalent noise power upper bound is given in Equation(19)