Online Regularization of Complex-Valued Neural Networks for Structure Optimization in Wireless-Communication Channel Prediction

This article proposes online-learning complex-valued neural networks (CVNNs) to predict future channel states in fast-fading multipath mobile communications. CVNN is suitable for dealing with a fading communication channel as a single complex-valued entity. This framework makes it possible to realize accurate channel prediction by utilizing its high generalization ability in the complex domain. However, actual communication environments are marked by rapid and irregular changes, thus causing fluctuation of communication channel states. Hence, an empirically selected stationary network gives only limited prediction accuracy. In this article, we introduce regularization in updates of the CVNN weights to develop online dynamics that can self-optimize its effective network size in response to such channel-state changes. It realizes online adaptive, highly accurate and robust channel prediction with dynamical adjustment of the network size. We characterize its online adaptability in a series of simulations and our practical wireless-propagation experiments demonstrate that the proposed channel prediction scheme provides 2.5 dB and 5.5 dB improvement of bit error rate (BER) at 10<sup>−3</sup> and <inline-formula> <tex-math notation="LaTeX">$5\times 10^{-4}$ </tex-math></inline-formula>, and achieves 10<sup>−5</sup> BER with <inline-formula> <tex-math notation="LaTeX">$E_{b}/N_{0}=23-24$ </tex-math></inline-formula> dB.


Introduction
Performance of mobile communications always suffers from signal degradation due to path loss, shadowing, interference and channel state changes caused by movement of users (Cho et al., 2010).In principle, fading, the most serious disturbance, can be mitigated by pre-equalization such as zero-forcing (Ho et al., 2017) or minimum-mean-square-error (MMSE) equalization (Eraslan et al., 2013).Transmission power control is another countermeasure against the fading phenomenon (Ren et al., 2018).These methods rely on accurate estimation of channel state information at the communication ends.However, in practical mobile communications, the channel state, or simply channel, changes rapidly and irregularly due to the movement of mobile users and their surroundings, resulting in time-varying multipath environment.The time fluctuation outdates the estimated channel and degrades the communication quality significantly.Channel prediction is an effective way to overcome this problem by forecasting channel changes in time based on preceding information.An accurate channel prediction is required to improve the communication quality and further adaptive transmission in the next-generation communications (Duel-Hallen, 2007;Bui et al., 2017).
There exist several works on the channel prediction in mobile communications based on, for example, linear (Maehara et al., 2003;Bui et al., 2013) and autoregressive (AR) model extrapolation (Eyceoz et al., 1998;Arredondo et al., 2002;Duel-Hallen et al., 2006;Sharma & Chandra, 2007).Although the low computational complexity in these methods is suitable for real-time operation in mobile communications, such simple linear or AR-based methods provide limited performance on predicting rapid changes of channel (Ding & Hirose, 2014a).
Neural-network-based channel prediction methods have also been studied very actively due to the recent successful development of artificial neural networks in various engineering fields.The generalization ability of neural networks provides flexible representation of complicated channel-state changes and high prediction capability.For instance, an echo-state-network (ESN) based (Zhao et al., 2017) and an extreme-learning-machine (ELM) based (Sui et al., 2018) as well as realvalued recurrent-neural-network (RNN) based (Liu et al., 2006;Potter et al., 2010) prediction methods have been reported, and their prediction performance has been evaluated in some simulated communication situations.To realize a high-precision prediction in practical mobile communications, the authors also proposed a method (Ding & Hirose, 2014a) based on a multiple-layer complexvalued neural network (ML-CVNN) by focusing rotary motion of the channel state in the complex plane.This method gave us superb channel prediction performance in several practical communication scenarios.
Generally, in neural-network-based applications, network size is critical to the application performance because it affects the generalization characteristics and calculation cost (Hirose, 2012;Ramachandram & Taylor, 2017).For example, a too small network is not enough to represent the complexity of targets, showing low convergence properties.On the other hand, a too large network requires expensive calculation costs, and most importantly, it causes overfitting.Despite its importance, the structure of the network is typically defined based on a rule of thumb manually.One may start with an arbitrary structure, and evaluate its learning performance using a large amount of training data by increasing or decreasing the number of neurons and network connections until the best structure is found.This is also the present state of the art for the neural-network-based methods in the channel prediction.For example, in our previous prediction method, we empirically set the structure of the CVNN (the number of input terminals and neurons in the hidden layer) based on its prediction accuracy in a series of simulations with several communication situations.Although the structure shows a high prediction performance on some simulated and experimentally observed fading channels (Ding & Hirose, 2014a), this manual pre-tuning of the network parameters is time consuming and not efficient.Moreover, mobile communications in the real world is forced to work in more diverse communication environments, and experiences more rapid and various fluctuations than those in simulations.As a result, an a priori tuned structure is no longer optimal for other practical communication environment, but the most suitable neural-network structure is dynamically changing accordingly.This motivates us to develop a dynamics to realize online adaptive and dynamically optimized neural-network structures for the channel prediction.
In this paper, to realize a dynamically optimized network structure to suit best to the fading channel at each moment, we propose a new ML-CVNNbased channel prediction scheme by introducing regularization.We work with a large-size network platform and then let it automatically find, or self-adjust to, a suitable structure within the platform that uses only a limited portion of the network in order to achieve a good generalization.The self-adjustment is performed by imposing a sparse constraint (Elad, 2010) to the connection weight updates.The sparse constraint suppresses the redundant connection weights to be zeros, and equivalently constructs a smaller scaled network using only the remaining non-zero connections (Ding & Hirose, 2014b).In order to follow the time fluctuation in the channel state to make the network structure optimized, we develop an online training-and-prediction framework.We update the network by using a set of the most recent channel immediately before the prediction with a small learning iteration number.We keep the updated network structure temporarily for the next training-and-predicting time frame.In this way, we change the non-zero connection distribution from time-to-time in the structure so that it keeps the most suitable size of the network for the situations of prediction.
In each training phase, we use a backpropagation of teacher signal (BPTS) (Hirose & Eckmiller, 1996), rather than the standard error-backpropagation.
The BPTS-based update method is simpler with a lower computation cost, which is preferred for mobile communications.We demonstrate that the new channel prediction method with the online adaptive CVNN structure presents highly accurate predictions under fluctuating communication environment not only in a series of simulations but also using actually observed fading channel in experiments.We precisely observe and discuss the effects of its dynamically changing structure on the bit-error rate performance.
The major contributions of our study can be summarized as follows.1. Proposal of a scheme to update the network structure online to follow dynamically changing environment at each moment; 2. Design of a new channel prediction method based on a ML-CVNN with the proposed online dynamic network structure and the BPTS for an adaptive prediction; 3. Verification of the fact that the proposed fast fading prediction has a performance superior to other approaches on simulated and experimentally observed channel states.This paper is organized as follows.Section 2 briefly introduces the channel model theory and path separation in the frequency domain.After reviewing the conventional CVNN-based channel prediction in Section 3, we propose a novel prediction method based on a ML-CVNN with the dynamically changing structure in Section 4.Then, Sections 5 and 6 present its performance in simulations and experiments, respectively.Finally, Section 7 provides the conclusion.

Channel Model and Multipath Separation in Frequency Domain
Channel states of communications are distorted mainly by multipath interference caused by scattering in the communication environment.In addition, movement of mobile users and/or scatterers causes rapid and irregular channel changes in time.Fig. 1 shows an example of a fading channel states in actual mobile communications.The curves demonstrate irregularity and nonlinearity of channel changes in the complex domain, and expresses difficulty of channel prediction because of its irregularly rotation-like changes.Generally, a signal received at a communication end y(t) at time t is modeled with time-varying channel c(t) as where s(t) and n(t) are transmitted signal and additive white Gaussian noise (AWGN), respectively.According to the Jakes model (Jakes, 1994), fading channel c(t) as a function of time t is modeled as a summation of individual M complex signal paths c m (t) at a receiver and expressed as a m e j(2πfmt+φm) (2) where a m , f m , and φ m are amplitude, Doppler frequency, and phase shift of each single path m, and M is the total path number.The Doppler frequency due to movement of a mobile user is given by where v and c are speed of the mobile user and the speed of light, respectively, frequency (Tan & Hirose, 2009).By sliding the Hann window in the past and by repeating the parameter estimation process, we can obtain separated path components at different time points.We focus on the fact that the separated channel states c m (t) have rotary locus in the complex plane and, then, predict its change in time for obtaining the future channel by using CVNNs.

Conventional CVNN-Based Channel Prediction with a Pre-Defined
Network Structure The changes in the separated channel components c m (t) can be predicted by ML-CVNNs (Ding & Hirose, 2014a).CVNN is a framework suitable for treating signal rotation and scaling adaptively in the complex plane by use of its high generalization ability (Hirose, 2012;Hirose & Yoshida, 2012).It has been receiving more attentions in various applications that intrinsically require dealing with complex values (Hara & Hirose, 2004;Kawata & Hirose, 2005;Valle, 2014;Arima & Hirose, 2017).With a basic ML-CVNN consisting of a layer of I ML input terminals, a hidden-neuron layer with K ML neurons and an output neuron, we can predict the complex-valued c m (t) from a set of past channel components, c m (t − 1), ..., c m (t − I ML ) for paths m = 1, ..., M .The input terminals distribute input signals, c m (t − 1), ..., c m (t − I ML ), to the hidden-layer neurons as their inputs z 0 .In the same way, the outputs of the hidden-layer neurons z 1 are passed to the output-layer neuron as its input.The neurons in the hidden layer are fully connected with the input terminals and the output-layer neuron.The output of the output-layer neuron z 2 is the prediction result cm (t).The connection weight w lki to ith input at kth neuron in layer l is expressed by its amplitude |w lki | and phase θ lki .The internal state u lk of kth neuron in lth layer is obtained as the summation of its inputs z (l−1) weighted by w lk = [w lki ], i.e., where z (l−1)i = |z (l−1)i |e jθ (l−1)i .The output z lk is given by adopting an amplitudephase-type activation function f ap to u lk as In our previous work, the connection weights W l = [w lk ] = [w lki ] in the ML-CVNN are updated as follows.The ML-CVNN regards the past known channel component ĉm (t) as an output teacher signal, while the preceding channel components associated with the same path ĉm (t − 1), ..., ĉm (t − I ML ) as the input teacher signals.The weights are updated based on the steepest descent method so that they minimize the difference put layer ẑ2 (Hirose, 1994;Hirose & Eckmiller, 1996;Hirose, 2012).The weight updates are performed at each estimated channel components by sliding the teacher signal and the input set in the time domain.We stop the update at a certain small number of iteration R ML in the update process for ĉm (t) and keep the updated weights as the initial values in the following weight update for ĉm (t + 1).With this procedure, we reduce the learning cost to follow the weak regularity of the separated channel components c m (t) and to achieve a channel prediction with high accuracy.

Proposal of Online Self-Optimizing CVNN
There are a number of preceding studies to get optimized structures of neural networks in general (Ishikawa, 1996;Tzyy-Chyang Lu et al., 2013; Ramachan-

Hidden layer
Output layer

Input terminals
Weight with zero amplitude Weight with non-zero amplitude dram & Taylor, 2017).The so-called destructive neural networks start learning with a large structure, and then prune redundant connection weights and neurons to obtain an optimum network (Karnin, 1990;Reed, 1993), whereas the constructive neural networks raise the size from a small network to larger ones (Elman, 1993;Barakat et al., 2011).
In this paper, we propose a new channel prediction method based on a dynamic network that prunes and grows connections depending on the fluctuating communication situations by introducing regularization in the complex domain.
Fig. 2 shows the construction of the CVNN.We want a CVNN that changes its connections according to the prediction situations, and dynamically keeps a suitable network structure in a series of predictions without manual tuning.
To realize such a network, we introduce a constraint for sparsity to the weight updates in order to restrict the connections of networks in a suitably small size.
The L 0 -norm is an exact sparsity measure, and our problem can be redefined as minimizing the error function of the weights (6) with the L 0 -norm constraint on the connection weights.However, this problem has been shown to be NP-hard in general.Fortunately, under some conditions, the L 1 -norm can serve as sparsity measure for substituting the L 0 -norm (Donoho & Elad, 2003;Gribonval & Nielsen, 2003).The L 1 -norm of the weights is a practical sparsity measure since it is convex so that we can perform optimization more easily (Candès et al., 2006;Donoho & Tanner, 2008;Elad, 2010).By introducing the sparse constraint as a penalty function, the objective function we use to update the weights in layer l is expressed as where α is a coefficient to express degrees of the penalty.Minimizing this term means restricting non-zero weight number to get its minimal number in the network.This is effectively equivalent to the pruning.In other words, the penalty function introduces sparsity to the weight updates so that the remaining connection weights form an effective structure for representing the output signal.
We use the steepest descent method in the complex domain to update the weights here (Hirose, 2012).Thus, the weight amplitude |w lki | and the phase θ lki are renewed as where θ rot lki ≡ θ lk − θ (l−1)i − θ lki , r is the index of learning iteration, and κ 1 and κ 2 are learning constants.This update rule has an additional term +α in the amplitude |w lki | update in comparison to the previous ML-CVNN-based method because of the penalty term.For simplicity and lower computational consumption, the BPTS is kept to use in this work for getting the teacher signal ẑ1 in the hidden layer from the teacher signal in the output layer ẑ2 as (Hirose, 1994) where (•) * represents the complex conjugate or hermite conjugate.
To predict the channel, we update the connection weights by time-sliding the input and output teacher signals by using the channel components ĉm (t) estimated sequentially from the Doppler spectrum as we did in the previous work (Ding & Hirose, 2014a).That is, a set of updated weights using the complex-valued estimation ĉm (t) as the output teacher signal (= prediction cm (t) in Fig. 2) and ĉm (t − 1), .future channel.The combination of the penalty term and the prediction scheme in the time domain is expected to keep the structure with a suitable size for the channel prediction depending on the fluctuating communication environment.

Numerical Experiments
In the following two sections, we evaluate the performance of the channel prediction based on the ML-CVNN with the penalty in simulations and experiments.We assume orthogonal frequency-division multiplexing (OFDM) with quadrature phase shift keying (QPSK) modulation, and time division duplex (TDD) as the communication scheme in this paper.Table 1 lists the system parameters.
In this section, we characterize the influence of the degree of penalty α on the neural network size and prediction accuracy for simulated fading channels.
The geometrical setup is shown in Fig. 3.We consider communications between a base station (BS) and a mobile user (MU) moving away from the BS at 12 m/s with a certain moving angle.There are two scatterers making 2 paths in addition to the line-of-sight path.The carrier frequency is 2 GHz here.We predict channel changes in a TDD frame based on its preceding channel states.The past path characteristics are estimated by using CZT with the Hann window.A window with 8-TDD-frame length is applied to the past channel states for estimating the path parameters, a m (t), f m (t), φ m (t), based on peaks in Doppler spectra and corresponding phase spectra.Then, the past path characteristics c m (t) are composed by using the parameters, and assigned as the estimated characteristics at the center of the window.We shift the window center at a TDD-frame interval for estimating multipath characteristics at every TDD frame.The details of the time frames are explained in our previous work (Ding & Hirose, 2014a).
To evaluate the performance in various channel changes, we change the scatterer distance ∆x shown in Fig. 3 from ∆x = 0.5 to 20 m with 0.5 m step, and performed 100 independent predictions at each scatterer distance along with the movement of MU.We start with the neural network with the parameters listed in Table 2.The penalty prunes and grows the network connections, 30 × 30 in the hidden layer and 30 × 1 in the output layer, online as the communication situation changes.as zero weights in order to fairly compare the penalty effect on the entire network.The network sizes of the ML-CVNN with various penalty coefficients (α = 0, 10 −5 , 10 −4 , 5 × 10 −4 , 10 −3 , 2 × 10 −2 ) have been evaluated, and the mean connection numbers for 100 trials in each condition have been normalized by the maximum possible connections to show the non-zero connection ratio.Corresponding prediction accuracy is calculated by accumulating predicted phase errors within the prediction frame.We find in Fig. 4(a) that the non-zero weight number consisting effective network decreases as the penalty coefficient α increases as we expect, whereas a network without the penalty (α = 0) keeps almost all of the connections active for all communication conditions.In Fig. 4(b), the smaller networks achieved by the penalty show better prediction stability compared to the conventional CVNN-based method (α = 0).The results also presents that the proposed prediction method reaches its best performance with a penalty coefficient around α = 5 × 10 −4 ∼ 10 −3 , and that α larger than this value introduces instability to the channel prediction again.These results show that the proposed prediction method with an appropriate α can prune redundant connections in its network automatically to achieve higher prediction accuracy even in prediction conditions difficult for the conventional method.

Experiments in Actual Communication Environment
In this section, we further demonstrate adaptability of the proposed method in prediction with actually observed fading channels.We experimentally observed fading channels in a communication situation shown in Fig. 5.There are a MU as a transmitter and a BS as a receiver in the experimental site with some objects, such as buildings and trees, consisting typical mobile communication environment in an urban area.The MU moves in the direction of the arrows shown in Figs.5(a) and (b) with a velocity around 12.5 m/s and transmits 1.297 GHz nonmodulated wave from a monopole antenna, whereas the BS receives the wave by using another monopole antenna.The received channel signal was mixed with 1.287 GHz local oscillator wave after an amplifier, and then extracted as a signal at an intermediate frequency of 10 MHz.After pass-  changes cause difficulty in the channel prediction and degrade the performance.
In Fig. 6(c), we can observe that the proposed method with the penalty function (α = 5 × 10 −4 ) increases its effective structure size at and/or after the large channel changes while the entire trend of the size is kept to be relatively compact through the process.On the other hand, the conventional method without the penalty (α = 0) does not change its network size significantly in any part of the update procedure, and no correlation with the channel changes is observed.
For further discussion, we focus on a prediction period containing three fast with the regularization provides higher channel prediction performance due to its dynamically changing structure.

Conclusion
In this paper, we proposed the online adaptive channel prediction method based on ML-CVNNs with self-optimizing dynamic structures.The penalty function based on L 1 -norm of the CVNN weights realizes the adaptive CVNN structure without large increase of calculation costs for the weight updates to achieve highly accurate and robust channel prediction of rapidly changing fading.Simulations and experiments demonstrated that the proposed CVNN automatically changes its effective connection number depending on the channel variation so that it keeps an appropriate network size to achieve accurate chan-nel prediction.The results presented in the experiments for actually observed channels showed that the proposed method can provide accurate prediction even in the situations difficult for conventional methods including the time-domain linear, the AR-model-based, and the former CVNN-based predictions.

Figure 1 :
Figure 1: An example of time-varying fading channel states in the complex domain measured in an actual mobile communication.
c is the carrier frequency of the communication, and ψ is the incident radio wave angle with respect to the motion of the mobile user.Observed channel c(t) in an actual communication can be decomposed into multiple path components c m (t) in the frequency domain based on this model.Different path components with different incident angles ψ appear as separate peaks in a Doppler frequency spectrum.Hence, the parameters of each path component can be estimated by finding peak amplitudes and Doppler frequencies for a m and f m in the Doppler spectrum and the corresponding phase shifts for φ m in the phase spectrum.Chirp z-transform (CZT) with a Hann window provides low calculation cost and a smooth frequency-domain interpolation useful for an accurate estimation of the parameters in the region close to zero where z l and ẑl denote temporary output signals and the teacher signals, respectively, in layer l.The teacher signals in the hidden layer ẑ1 are the signals obtained through the backpropagation of the teacher signal (BPTS) of the out-

Figure 2 :
Figure 2: Construction of the complex-valued neural network, in which the solid arrows show non-zero-amplitude connections while dashed arrows represent zero-amplitude ones.

Figure 3 :
Figure 3: Geometrical setup used in the simulation.There are two scatters separate by ∆x m, a base station (BS), and a mobile user (MU) in an open communication space.The line of sight between the BS and the MU is considered.The MU moves in the direction of the arrow (−30 • from the x axis) with a velocity of 12 m/s.

Fig. 4
Fig. 4(a) shows the mean of the network size at each scatterer distance condition.A connection weight is counted as non-zero if its amplitude satisfies

Figure 4 :
Figure 4: Simulation results showing (a) averaged non-zero weight ratios (network size) and (b) maximum predicted phase errors against scatterer distance ∆x in Fig. 3 for various penalty coefficient α.
Fig. 4(b) presents the maximum estimated phase errors in each communication condition, showing stability of the prediction.

Figure 5 :
Figure 5: Geometrical setup of the experiment illustrated as (a) two-dimensional top view (Google Maps, modified) and (b) three-dimensional side view (Google Earth, modified) which includes a fixed base station (BS) and a moving mobile user (MU).
Fig.8shows the BER curves against bit-energy to noise-power-density ratio

Table 1 :
Communication Parameters

Table 2 :
Channel Prediction Parameters