The Wiener Filter-based Adaptive Denoising for Pseudo Analogy Video Transmission

With the popularity of video conferences, video calls and other activities, video transmission has been widely used. To meet a huge number of subscribers’ requirements, the mobile video transmission scheme needs to overcome some disadvantages, such as resources limitation and noise interference. The knowledge-enhanced mobile video broadcasting (KMV-Cast) is a scheme utilizing joint source-channel coding and correlated information in clouds. However, there is an item of noise that cannot be removed in the originalKMV-Cast scheme. In this paper, an adaptiveWiener filtering denoising algorithm is proposed to reduce such noise at the receiver in order to maximize the signal-to-noise ratio (SNR) of the reconstructed video frame. The simulation results show that the proposed Wiener filter algorithm is superior to other schemes without the Wiener filter under different sources and channel qualities. At lower-SNR channels (i.e., -5dB), the proposed algorithm achieves 2dB gains in terms of peak signal-to-noise ratio (PSNR), while at higher-SNR channels (i.e., 10dB), the proposed algorithm achieves 3dB gains in terms of PSNR.


I. INTRODUCTION
W ITH the prediction of Cisco Annual Internet Report 2020, nearly two-thirds of the global population will have Internet access by 2023. There will be 5.3 billion total Internet users (66 percent of global population) by 2023, up from 3.9 billion (51 percent of global population) in 2018 [1]. With the increasing total number of the Internet users, the video transmission service and its data-stream traffic also increase accordingly, which puts pressure on the current wireless video transmission technology. In a wireless communication system, when the wireless resources are limited, it is necessary to reduce the transmission rate of redundant data. Meanwhile, channel noise affects the quality of reconstructed video, and thus it is also necessary to remove noise in the reconstructed video at the receiver.
As we all know, the traditional wireless video transmission scheme is based on the Shannon source-channel separation theory, while in 2003, M. Gastpar, B. Rimoldi and M. Vetterli proposed that the joint source-channel coding could be optimal in some conditions [2]. When the channel quality is below a certain threshold in mobile communications, the quality of the received video declines sharply, which is called cliff effect. To overcome such cliff effect, a typical scheme called Softcast was proposed with the joint source-channel coding scheme [3]. During the process of video coding and decoding, only linear transformation is used, and thus the quality of the reconstructed video is linearly correlated with the channel quality. Then, based on the Softcast scheme, some improved pseudo-analog transmission schemes were proposed in [4]- [7]. Besides, many companies are using cloud services, and move their applications and services from local to the cloud [8]. As a result, the video download from cloud services becomes a hotspot. The knowledge-enhanced mobile video broadcasting (KMV-Cast) is a brand-new video transmission framework [9]. Compared with other pseudoanalog video transmission schemes in [3]- [7], it leverages cloud related information in video coding and decoding. During the reconstruction at the receiver, KMV-Cast eliminates the noise of mutual interference. In this paper, we focus on reducing the residual noise in KMV-Cast scheme further.
Generally, noise is classified into additive noise and multiplicative noise [10]. The distribution of noise is very impor-tant and widely used in conventional denoising algorithms. Recently, the convolutional neural network (CNN) has been widely used for denoising in image processing [11]- [13]. However, such algorithms do not exploit the relevant information in clouds [11]- [13].
Based on the statistical characteristics of Gaussian noise, the least mean square (LMS) error has been applied as the optimization goal [14]- [16]. For a video frame, the values of adjacent pixels are usually near with less difference. As a result, the Wiener filter has been widely used for the image denoising [17]- [21]. The goal of the Wiener filtering is minimizing mean square error (MMSE). It should be noted that the prior knowledge on the power spectral density of the noise should be previous given in the Wiener filter. Hence, Wiener filtering maybe a promising method to remove the second noise item in KMV-Cast scheme and this paper will propose an adaptive denoising algorithm with Wiener filter.
The rest of the paper is organized as follows. We take a brief review on KMV-Cast scheme in Section 2. We propose a denoising algorithm in Section 3 and each pixel block is considered as a unit which is selectively transmitted and adaptively passing a Wiener filter. In Section 4, we present the detailed frameworks of the proposed algorithm and the simulation results are shown in Section 5. Finally, Section 6 concludes this paper.

II. KMV-CAST SCHEME: A BRIEF REVIEW
The KMV-Cast transmission scheme was proposed in [9] as shown in Fig. 1. As one can see from Fig. 1, there are two cloud servers at the transmitter and receiver separately, and the correlated information of the transmitted video is available at both sides. At the transmitter, the correlated information in the cloud is used to remove the redundant information in the broadcasting video. At the receiver, with the help of correlated information in the local cloud, the video is reconstructed by maximizing SNR. In technical, Fig. 1 mainly contains 1) hierarchical Bayesian model, 2) related information and prior knowledge extraction, and 3) reconstruct received signal.

A. HIERARCHICAL BAYESIAN MODEL
In order to make full use of the relevant information, KMVcast uses the hierarchical Bayesian model to describe the relationship between transmitted video and its correlated information in the cloud.
At the transmitter, video frames are evenly divided into small pixel blocks (i.e., 8 by 8). For each block, the 2D-DCT transform is used to reduce the redundant information. The high-frequency DCT coefficients, close to zero, can be discarded to save the transmission bandwidth. Then, we scale the DCT coefficients and reshape them into an m × 1 vector (i.e., αθ), where α is the power scaling factor and θ is a normalized vector. The original DCT coefficients can be represented as λθ, where λ is the amplitude value of the block. Besides, an m × m unitary matrix (Φ) is multiplied to reduce the peak-to-average power ratio. As a result, the received signal using the pseudo analog modulation can be represented as where v is an m × 1 vector and represents independently and identically distributed Gaussian noise with zero-mean and a known variance σ 2 0 . At the receiver, the work in [9] proceed the video reconstruction based on Eqn. (1) with the hierarchical Bayesian model, and the likelihood function of the received signal can be represent as (2) To simplify, the work in [9] also use a Gaussian distribution to approximate the probability density function of the DCT coefficients θ, i.e., θ ∼ (0, Ω). Given the received signal y and hyperparameter Ω, the posterior probability density function of θ can be written as a multivariate Gaussian distribution, that is, Since the posterior probability in Eqn. (3) is a multivariate Gaussian distribution, it can be denoted as In Eqn. (4), u and Σ are the mean value and the variance of the Gaussian distribution respectively, which are given as [9] u = ΣαΦ T σ −2 0 y (5) If we take the mean value of the posterior probability in Eqn. (4) as the reconstructed video block, there is a mutual interference at the reconstructed signal (see Eqn. (5)). The KMV-Cast scheme proposed a method to cancel such mutual interference at the transmitter by rotating original DCT coefficient as [9]θ where p is a constant and will be defined in Eqn. (14), and thus the new received signal like Eqn. (1) can be rewritten as Thus, the Bayesian estimation of the reconstructed video can be rewritten aŝ

B. RELATED INFORMATION AND PRIOR KNOWLEDGE EXTRACTION
Bayesian estimation is used again for relevant information in order to find the video structure information Ω. Assume that there are N related video pixel blocks available in clouds, that is, Here, we choose the most correlated pixel block in the cloud, and the corresponding Bayesian estimation of video structure information with the maximal SNR can be written as [9] as the power scaling parameter, r is an undetermined parameter and the expression (7) can be rewritten as:θ where Since θ andθ are both normalized vectors, we have θ 2 = 1 and thus In Eqn. (14), K = ( θ T i θ) represents the correlation coefficient.

C. RECONSTRUCTION FROM THE RECEIVED SIGNALS
At the receiver, the goal is to maximize the quality of the reconstructed video. The demodulated signal at the receiver can be written aŝ Since the transmitter and the receiver both have the information θ i in their cloud, we can multiple θ T i at both sides of Eqn. (15) simultaneously to get Thus, we can calculate Then, the third item in Eqn. (15) can be removed given the result in Eqn. (17) and Eqn. (15) can be rewritten aŝ From Eqn. (18), we obtain the noise power as: The corresponding signal-to-noise ratio SN R 1 of the reconstructed signal in Eqn. (18) is Two new variables are denoted in [9] for simplicity, Substitute the Eqn. (22) into Eqn. (14) to get the power expression, and the expression of SN R 1 can be transformed into

III. ADAPTIVE DENOISING WITH WIENER FILTER FOR THE KMV-CAST SCHEME
This section will propose the adaptive denoising algorithm with Wiener filter, which will further reduce the noise exising in KMV-Cast scheme. The diagram of this algorithm is highlighten in Fig. 2. In Fig. 2, the dashed lines represent the original transmission framework of KMV-Cast scheme, and the solid lines at the receiver are the framework of the proposed adaptive denoising algorithm.
The proposed algorithm mainly contains three parts: 1) comparing the SNR vaules of the reconstructed pixel block with or without Wiener filtering; 2) selective adding a Wiener filter to maximize the SNR of the pixel block; 3) optimizing parameters to maximize the whole reconstructed frame's SNR.

A. WIENER FILTERING
The essence of Wiener filtering is to minimize the mean square error (MMSE) of the estimated signal. The process of Wiener filtering can be represented as Fig. 3. As we can see from Fig. 3, the input of the filter is the sum of the original signal and the noise, and the ideal output is the original signal pθ. But the actual output Hθ cannot be the optimal. Based on MMSE criterion, the transfer function can be formulated as: In Eqn. (25), H is the matrix of the optimal transfer function, R ss is the autocorrelation matrix of the original signal, and R vv is the autocorrelation matrix of the noise. Then, the two autocorrelation matrixes in Eqn. (25) can be calculated as: Based on the expression of Ω in Eqn. (11), the autocorrelation matrix can be represented as So, the transfer function can be represented as: Considering the Eqn. (29) is independent of time, the output signal through the Wiener filter can be written as As a result, the noise of the signal processed again is changed into Similar to the KMV-Cast, we can calculate the noise power through P N = E tr noise · noise T (32) and the noise power is the sum of the following three parts: We can use the notation in Eqn. (22) to rewrite the SN R 2 of the KMV-Cast scheme after Wiener filtering as: Then the two SNRs in Eqn. (24) and Eqn. (36) can be represented by t. Therefore, we can plot the two SNRs' curve with respect to t. Under different values of power scaling parameter C and the same value K, we can get the results in Fig. 4. From Fig.4, one can be seen that the improvement with Wiener filter is obvious when the power scaling parameter C is small. However, when power scaling parameter C increases, the scheme without a Wiener filter is better.

B. ADAPTIVE USING WIENER FILTER
Sometimes, there are some correlated pixel blocks which have high similarity as the transmission pixel block. In this case, it is better to reconstruct the pixel block directly with the index of the corresponding pixel block received instead of DCT coefficients. Such case will not only reduce bandwidth consumption, but also improve the quality of the reconstructed video.
If we only use the relevant information and transmit the index of the similar pixel block θ i , the SNR of the reconstructed video block is In Eqn. (38), one can see that the SNR is only determined by the similarity coefficient K.
Considering the use of the related information in cloud, we need to choose the best way among three possible cases to reconstruct the image, corresponding to the peak values of SN R 1 and SN R 2 as SN R 1max and SN R 2max respectively. Through comparison, we decide whether or not to transmit the coded pixel block, and whether or not to adopt a Wiener filter at the receiver. The detailed algorithm is in TABLE I, From the expressions of SNRs, one can see that there are four parameters should be calculated out, i.e., A, C, r, t. Among them, A and r can be represented by t. Based on maximizing the SNR of each pixel block, we can calculate parameters A, r, t, while based on maximizing the SNR of the whole reconstructed video frame, we can calculate the parameter C. The detailed algorithm is presented in TABLE II.
In order to maximize the SNR of the reconstructed video frame, we should minimize the total noise power of all transmitted pixel blocks with the given constrain of signal power P and the noise power is calculated in TABLE III.

TABLE 3. Framework of calculate noise power
Assume l j is the noise power of the jth reconstructed block and it can be respectively represented with three different conditions.
Then, the minimum total noise power can be written as with the power constraint condition: where λ 2 j l j /C j is the noise power of the jth reconstructed block. In order to minimalize the total noise of the reconstruct video, we need to allocate the power scaling C j parameter by Lagrange multipliers as

IV. THE FRAMEWORK OF THE PROPOSED ALGORITHM
The framework of the proposed algorithm mainly includes the frameworks at the transmitter and the receiver, respectively. The detailed frameworks at both sides are introduced in TABLE IV and TABLE V.

V. EXPERIMENTAL RESULTS
The factors affecting the quality of the reconstructed video include: 1) the similarity of the correlated information; 2) the quality of the channel; 3) the characteristics of the video source. As a result, we analyze the performance of the proposed algorithm considering such three factors.

A. THE EFFECT OD CORRELATED INFORMATION
In this section, we evaluate the performance of the proposed adaptive denoising algorithm in terms of PSNR. Assume that the transmission channel is slow fading and its distortion can be canceled by the equalizer. With the standard video test sequences as the sources, we simulate the video transmission VOLUME 4, 2016  scheme under the additive white Gaussian noise channel. To compare the simulation results, we mainly choose three typical transmission schemes to compare with the proposed algorithm, which are uncoded transmission scheme, Softcast scheme and KMV-Cast scheme.
Similar to the KMV-Cast transmission framework, the transmitted video is segmented into frames and the correlated information can be known both at the transmitter and the receiver. Specifically, the frames transmitted before the current frame can be chosen as the correlated information in clouds and its similarity can be changed by the spacing between the reference frame and the transmission frame.
With the standard video test sequence "Foreman", we choose the 4th frame as the correlation information in clouds, and respectively transmit 5th, 15th and 215th frames with the highly correlated, fairly correlated and uncorrelated information in clouds. Set the channel SNR as 10dB, and the simulation results are shown from Fig.5 to Fig.7.
Totally, the proposed algorithm is superior to other three schemes. In details, from Fig.5 with highly correlated information in clouds, there are 12.5dB and 2.7dB of PSNR gains of the proposed algorithm under high quality channel, compared with Softcast and KMV-Cast scheme, respectively. However, in Fig.7 with less correlated information, there are respectively 3.3dB and 1dB of PSNR gains. Therefore, it   can be seen that the performance gain increases with the increasement of similarities between the transmitted signal and the correlated information in clouds.

B. QUALITY OF CHANNEL
Assume the transmitted signal is highly correlated with the information in clouds. With the standard video test sequence "Carphone", we select 19th frame as the correlated information in clouds and 20th frame as the transmitted signal in Fig.8. Let us change SNRs of the received signal and analyze the impact of the SNRs on the qualities of reconstructed video frames with different transmission schemes. Set SNRs as -10dB, -5dB, 0dB, 5dB, and the four schemes' simulation results are shown in Fig.9-Fig.12. From Fig.9 to Fig12, we can see that the proposed adaptive denoising algorithm adopted in the KMV-Cast video transmission scheme has the best performance. Compared to the KMV-Cast transmission scheme, the proposed algorithm achieves the PSNR gains of 0.7dB, 1.9dB, 1.4dB and 1.4dB under SNRs as -10dB, -5dB, 0dB and 5dB, respectively, while compared to the SoftCast transmission scheme, the proposed algorithm achieves the PSNR gains of 17dB, 14dB, 14dB and 13dB, respectively. At lower-SNR channel (i.e., -10dB), KMV-Cast transmission scheme utilizes more correlated pixel blocks to reconstruct the video frame instead of transmission, and thus the improvement of the proposed algorithm is limited. As the quality of the channel improves, the number of transmitted blocks increases and the performance of the algorithm gradually appears.

C. SOURCES
Sometimes, different sources can achieve different performances. Assume the transmitted signal is highly correlated with the information in clouds. Choosing the standard video test sequences, "Carphone", "Container", "Bridge (close)" and "Hall Monitor" as sources, and changing the channel's quality SNR as -10dB, -5dB, 0dB, 5dB, 10dB, the simulation results are shown at Fig. 13. One can see that, under different video sources and channel SNRs, the proposed algorithm has the best performance. Specifically, when the reconstructed frame of KMV-Cast scheme contains more Gaussian noise, the proposed algorithm achieves good performance, see Fig.13(c). At the channel SNR 0dB, the simulation results of reconstructed 20th frame are shown in Fig. 14. From the simulation results shown in this section, one can see that the proposed algorithm is an enhancement version of the conventional KMV-Cast transmission scheme.

VI. CONCLUSION
In this paper, the Wiener filter-based adaptive denoising algorithm has been proposed for pseudo analogy video transmission, i.e., KMV-Cast video transmission scheme. The residual noise existing in KMV-Cast is reduced to further improve the quality of the reconstructed video at the receiver. Specifically, we set the maximizing SNR as the optimization goal, and adaptively determine whether to add a Wiener filter or not at the receiver. The simulation results have shown that the proposed denoising algorithm performs the best, comparing with the other three typical schemes. Our future and then, we can get the expression of p 2 (Eqn. (14)).