Noise Learning Based Denoising Autoencoder

This letter introduces a new denoiser that modifies the structure of denoising autoencoder (DAE), namely noise learning based DAE (nlDAE). The proposed nlDAE learns the noise of the input data. Then, the denoising is performed by subtracting the regenerated noise from the noisy input. Hence, nlDAE is more effective than DAE when the noise is simpler to regenerate than the original data. To validate the performance of nlDAE, we provide three case studies: signal restoration, symbol demodulation, and precise localization. Numerical results suggest that nlDAE requires smaller latent space dimension and smaller training dataset compared to DAE.


I. INTRODUCTION
Machine learning (ML) has recently received much attention as a key enabler for future wireless communications [1], [2], [3].While the major research effort has been put to deep neural networks, there are enormous number of Internet of Things (IoT) devices that are severely constrained on the computational power and memory size.Therefore, the implementation of efficient ML algorithms is an important challenge for IoT devices, as they are energy and memory limited.Denoising autoencoder (DAE) is a promising technique to improve the performance of IoT applications by denoising the observed data that consists of the original data and the noise [4].DAE is a neural network model for the construction of the learned representations robust to an addition of noise to the input samples [?], [5].The representative feature of DAE is that the dimension of the latent space is smaller than the size of the input vector.It means that the neural network model is capable of encoding and decoding through a smaller dimension where the data can be represented.
The main contribution of this letter is to improve the efficiency and performance of DAE with a modification of its structure.Consider a noisy observation Y which consists of the original data X and the noise N , i.e., Y = X + N .From the information theoretical perspective, DAE attempts to minimize the expected reconstruction error by maximizing a lower bound on mutual information I(X; Y ).In other words, Y should capture the information of X as much as possible although Y is a function of the noisy input.Additionally, from the manifold learning perspective, DAE can be seen as a way to find a manifold where Y represents the data into a low dimensional latent space corresponding to X.However, we often face the problem that the stochastic feature of X to be restored is too complex to regenerate or represent.This is called the curse of dimensionality, i.e., the dimension of latent space for X is still too high in many cases.
What can we do if N is simpler to regenerate than X?It will be more effective to learn N and subtract it from Y instead of learning X directly.In this light, we propose a new denoising framework, named as noise learning based DAE (nlDAE).The main advantage of nlDAE is that it can maximize the efficiency of the ML approach (e.g., the required dimension of the latent space or size of training dataset) for capability-constrained devices, e.g., IoT, where N is typically easier to regenerate than X owing to their stochastic characteristics.To verify the advantage of nlDAE over the conventional DAE, we provide three practical applications as case studies: signal restoration, symbol demodulation, and precise localization.
The following notations will be used throughout this letter.
• Ber, Exp, U, N , CN : the Bernoulli, exponential, uniform, normal, and complex normal distributions, respectively.• x, n, y ∈ R P : the realization vectors of random variables X, N, Y , respectively, whose dimensions are P .• P (< P ): the dimension of the latent space.
• W ∈ R P ×P , W ∈ R P ×P : the weight matrices for encoding and decoding, respectively.• b ∈ R P , b ∈ R P : the bias vectors for encoding and decoding, respectively.• S: the sigmoid function, acting as an activation function for neural networks, i.e., S(a) =

II. METHOD OF NLDAE
In the traditional estimation problem of signal processing, N is treated as an obstacle to the reconstruction of X.Therefore, most of the studies have focused on restoring X as much as possible, which can be expressed as a function of X and N .Along with this philosophy, ML-based denoising techniques, e.g., DAE, have also been developed in various signal processing fields with the aim of maximizing the ability

MSE
The region where nlDAE is better than DAE Fig. 2: A simple example of comparison between DAE and nlDAE: reconstruction error according to σ N .
to restore X from Y .Unlike the conventional approaches, we hypothesize that, if N has a simpler statistical characteristic than X, it will be better to subtract from Y after restoring N .We first look into the mechanism of DAE to build neural networks.Recall that DAE attempts to regenerate the original data x from the noisy observation y via training the neural network.Thus, the parameters of a DAE model can be optimized by minimizing the average reconstruction error in the training phase as follows: where L is a loss function such as squared error between two inputs.Then, the j-th regenerated data x(j) from y (j) in the test phase can be obtained as follows for all j ∈ {1, • • • , L}: It is noteworthy that, if there are two different neural networks which attempt to regenerate the original data and the noise from the noisy input, the linear summation of these two regenerated data would be different from the input.This means that either x or n is more effectively regenerated from y.Therefore, we can hypothesize that learning N , instead of X, from Y can be beneficial in some cases even if the objective is still to reconstruct X.This constitutes the fundamental idea of nlDAE.
The training and test phases of nlDAE are depicted in Fig. 1.
The parameters of nlDAE model can be optimized as follows for all i ∈ {1, • • • , M }: Notice that the only difference from (1) is that x (i) is replaced by n (i) .Let x(j) nl denote the j-th regenerated data based on nlDAE, which can be represented as follows for all j ∈ {1, • • • , L}: To provide the readers with insights into nlDAE, we examine two simple examples where the standard deviation of X is fixed as 1, i.e., σ X = 1, and that of N varies.Y = X + N is comprised as follows: 3) and N ∼ N (0, σ N ).• Example 2: X ∼ Exp(1) and N ∼ N (0, σ N ).Fig. 2 describes the performance comparison between DAE and nlDAE in terms of mean squared error (MSE) for the two examples 1 .Here, we set P = 12, P = 9, M = 10000, and L = 5000.It is observed that nlDAE is superior to DAE when σ N is smaller than σ X in Fig. 2. The gap between nlDAE and DAE widens with lower σ X .This implies that the standard deviation is an important factor when we select the denoiser between DAE and nlDAE.
These examples show the consideration of whether X or N is easier to be regenerated, which is highly related to differential entropy of each random variable, H(X) and H(N ) [?].The differential entropy is normally an increasing function over the standard deviation of the corresponding random variable, e.g., H(N ) = log(σ N √ 2πe).Naturally, it is efficient to reconstruct a random variable with a small amount of information, and the standard deviation can be a good indicator.

III. CASE STUDIES
To validate the advantage of nlDAE over the conventional DAE in practical problems, we provide three applications for IoT devices in the following subsections.We assume that the noise follows Bernoulli and normal distributions, respectively, in the first two cases, which are the most common noise modeling.The third case deals with noise that follows a distribution expressed as a mixture of various random variables.For all the studied use cases, we select the DAE as the conventional denoiser as a baseline for performance comparison.We present the case studies in the first three subsections.Then, we discuss the experimental results in Sec.III-D.

A. Case Study I: Signal Restoration
In this use case, the objective is to recover the original signal from the noisy signal which is modeled by the corruptions over samples.
1) Model: The sampled signal of randomly superposed sinusoids, e.g., the recorded acoustic wave, is the summation of samples of k damped sinusoidal waves which can be represented as follows: V l e −γ l n∆t cos(2πf l n∆t) where V l , γ l , and f l are the peak amplitude, the damping factor, and the frequency of the l-th signal, respectively.
Here, the time interval for sampling, ∆t, is set to satisfy the Nyquist theorem, i.e., To consider the corruption of x, let us assume that the probability of corruption for each sample follows the Bernoulli distribution Ber(p cor ), which indicates the corruption with the probability p cor .In addition, let b ∈ {0, 1} P denote the realization of Ber(p cor ) over P samples.Naturally, the corrupted signal, y ∈ R P , can be represented as follows: where C is a constant representing the sample corruption.
2) Application of nlDAE: Based on (6), the denoised signal x(j) nl can be represented by where 3) Experimental Parameters: We evaluate the performance of the proposed nlDAE in terms of the MSE of restoration.For the experiment, the magnitude of noise C is set to 1 for simplicity.In addition, V l , γ l , and f l follow N (0, 1), U(0, 10 3 ), and U(0, 10 kHz), respectively, for all l.The sampling time interval ∆t is set to 0.5 × 10 −4 second, and the number of samples P is 12.We set P = 9, p cor = 0.9, and M = 10000 unless otherwise specified.

B. Case Study II: Symbol Demodulation
Here, the objective is to improve the symbol demodulation quality through denoising the received signal that consists of channel, symbols, and additive noise.
1) Model: Consider an orthogonal frequency-division multiplexing (OFDM) system with P subcarriers where the subcarrier spacing is expressed by ∆f .Let d ∈ C P be a sequence in frequency domain.d[n] is the n-th element of d and denotes the symbol transmitted over the n-th subcarrier.In addition, let K denote the pilot spacing for channel estimation.Furthermore, the channel impulse response (CIR) can be modeled by the sum of Dirac-delta functions as follows: where α l , τ l , and L p are the complex channel gain, the excess delay of l-th path, and the number of multipaths, respectively.Let x ∈ C P denote the discrete signal obtained by Ppoint fast Fourier transform (FFT) after the sampling of the signal experiencing the channel at the receiver, which can be represented as follows: where denotes the operator of the Hadamard product.Here, h ∈ C P is the channel frequency response (CFR), which is the P -point FFT of h(t, τ ).In addition, let n ∈ C P denote the realization of the random variable N ∼ CN (0, σ N ).Finally, y(= d h + n) is the noisy observed signal.
Our goal is to minimize the symbol error rate (SER) over d by maximizing the quality of denoising y.We assume the method of channel estimation is fixed as the cubic interpolation [6] to focus on the performance of denoising the received signal.
2) Application of nlDAE: To consider the complex-valued data, we separate it into real and imaginary parts.and denote the operators capturing real and imaginary parts of an input, respectively.Thus, x(j) nl is the regenerated d (j) h (j) by denoising y (j) , which can be represented by where Finally, the receiver estimates h with the predetermined pilot symbols, i.e., d[nK +1] where n = 0, 1, • • • , and demodulates d based on the estimate of h and the regenerated xnl .

Case Study III: Precise Localization
The objective of this case study is to improve the localization quality through denoising the measured distance which is represented by the quantized value of the mixture of the true distance and error factors.
1) Model: Consider a 2-D localization where P reference nodes and a single target node are randomly distributed.We estimate the position of the target node with the knowledge of the locations of P reference nodes.Let x ∈ R P denote the vector of true distances from P reference nodes to the target node when X denotes the distance between two random points in a 2-D space.We consider three types of random variables for the noise added to the true distance as follows: • N N : ranging error dependent on signal quality.
• N U : ranging error due to clock asynchronization.
• N B : non line-of-sight (NLoS) event.We assume that N N , N U , N B follow the normal, uniform, and Bernoulli distributions, respectively.Hence, we can define the random variable for the noise N as follows: where R N LoS is the distance bias in the event of NLoS.Note that N does not follow any known probability distribution because it is a convolution of three different distributions.
Besides, we assume that the distance is measured by time of arrival (ToA).Thus, we define the quantization function Q B to represent the measured distance with the resolution of B, e.g., Q 10 (23) = 20.In addition, the localization method based on multi-dimensional scaling (MDS) is utilized to estimate the position of the target node [7].
2) Application of nlDAE: In this case study, we consider the discrete values quantized by the function Q B .Here, x(j) nl can be represented as follows: where Thus, xnl is utilized for the estimation of the target node position in nlDAE-assisted MDS-based localization.
3) Experimental Parameters: The performance of the proposed nlDAE is evaluated via L = 5000.In this simulation, 12 reference nodes and one target node are uniformly distributed in a 100×100 square.We assume that N N ∼ N (0, 10), N U ∼ U(0, 20), N B ∼ Ber(0.2), and R N LoS = 50.The distance resolution B is set to 10 for the quantization function Q B .Note that P = 9, p N LoS = 0.2, and M = 10000 unless otherwise specified.We also provide the result of non-ML (i.e., only MDS based localization).

D. Analysis of Experimental Results
Fig. 3(a), Fig. 4(a), and Fig. 5(a) show the performance of the three case studies with respect to P , respectively.nlDAE outperforms non-ML and DAE for all ranges of P .Particularly with small values of P , nlDAE continues to perform well, whereas DAE loses its merit.This means that nlDAE provides a good denoising performance even with an extremely small dimension of latent space if the training dataset is sufficient.
The impact of the size of training dataset is depicted in Fig. 3(b), Fig. 4(b), and Fig. 5(b).nlDAE starts to outperform non-ML with M less than 100.Conversely, DAE requires about an order higher M to perform better than non-ML.Furthermore, nlDAE converges faster than DAE, thus requiring less training data than DAE.
In Fig. 3(c), Fig. 4(c), and Fig. 5(c), the impact of a noiserelated parameter for each case study is illustrated.When the noise occurs according to a Bernoulli distribution in Fig. 3(c), the performance of ML algorithms (both nlDAE and DAE) exhibits a concave behavior.This is because the variance of Ber(p) is given by p(1 − p).Similar phenomenon is observed in Fig. 5(c) because the Bernoulli event of NLoS constitutes a part of localization noise.As for non-ML, the performance worsens as the probability of noise occurrence increases in both cases.Fig. 4(c) shows that the SER performance of nlDAE improves rapidly as the SNR increases.In all experiments, nlDAE gives superior performance than other schemes.
Thus far, the experiments have been conducted with a single hidden layer.Fig. 3(d), Fig. 4(d), and Fig. 5(d) show the effect of the depth of the neural network.The performance of nlDAE is almost invariant, which suggests that nlDAE is not sensitive to the number of hidden layers.On the other hand, the performance of DAE worsens quickly as the depth increases owing to overfitting in two cases.
In summary, nlDAE outperforms DAE over the whole experiments.nlDAE is observed to be more efficient for the underlying use cases than DAE because it requires smaller latent space and less training data.Furthermore, nlDAE is more robust to the change of the parameters related to the design of the neural network, e.g., the network depth.

IV. CONCLUSION AND FUTURE WORK
We introduced a new denoiser framework based on the neural network, namely nlDAE.This is a modification of DAE in that it learns the noise instead of the original data.The fundamental idea of nlDAE is that learning noise can provide a better performance depending on the stochastic characteristics (e.g., standard deviation) of the original data and noise.We applied the proposed mechanism to the practical problems for IoT devices such as signal restoration, symbol demodulation, and precise localization.The numerical results support that nlDAE is more efficient than DAE in terms of the required dimension of the latent space and the size of training dataset, thus rendering it more suitable for capability-constrained conditions.Applicability of nlDAE to other domains, e.g., image inpainting, remains as a future work.Furthermore, information theoretical criteria of decision making for the selection between or a combination of DAE and nlDAE is an interesting further research.

Fig. 3 :
Fig. 3: Case study I (signal restoration): MSE according to (a) the dimension of latent space; (b) the size of training dataset; (c) p cor ; and (d) the depth of neural networks.

Fig.
Fig. Case study II (symbol demodulation): SER according to (a) the dimension of latent space; (b) the size of training dataset; (c) SNR; and (d) the depth of neural networks.

Fig. 5 :
Fig. 5: Case study III (precise localization): Localization error according to (a) the dimension of latent space; (b) the size of training dataset; (c) p N LoS ; and (d) the depth of neural networks.