Synthesis-Distortion-Aware Hybrid Digital Analog Transmission for 3D Videos

The hybrid digital-analog (HDA) video transmission scheme can be used to avoid the cliff effect and saturation effect . However, directly using HDA in 3D video transmission requires too much bandwidth. This paper addresses synthesis-distortion-aware HDA transmission for 3D videos to improve transmission performance. First, a 3D HDA framework that transmits both texture and depth videos in HDA mode is designed. Second, a recursive synthesis distortion estimation model, called RSDE-3D-HDA, is derived, where the transmission errors of both the texture and depth sequences are considered. Third, we optimize the power allocation between digital and analog signals based on the RSDE-3D-HDA. Finally, simulation results show that our model is accurate and the proposed 3D-HDA achieves better performance in terms of synthesis quality than state-of-the-art methods.


I. INTRODUCTION
Three-dimensional (3D) video has drawn extensive attention from both industry and academia in recent years. 3D video contents are usually represented in the multiview video plus depth (MVD) format [1], including texture videos and associated depth videos. By transmitting the MVD data, a virtual view between any two captured views can be rendered on the decoder side by using depth image-based rendering (DIBR) technology [2], which has been adopted in the Moving Picture Experts Group View Synthesis Reference Software (MPEG-VSRS) [3].
Due to the quantization and entropy coding adopted by source coding, conventional digital video transmission schemes, such as H.264/AVC with digital channel coding and modulation for 2D videos, have two main disadvantages: The associate editor coordinating the review of this manuscript and approving it for publication was Aniello Castiglione . first, when the channel quality is lower than a threshold, the received video quality will degrade dramatically, which is known as the cliff effect [4]; second, when the channel quality is greater than a threshold, even if the channel quality improves further, a better video quality cannot be obtained, this is called the quality saturation effect [5]. The problem is even worse in 3D video transmission because errors in one view can propagate to virtual views.
To overcome these two challenges, an analog-like Softcast method [6], [7] was developed for 2D video transmission. Unlike digital coding, Softcast does not use quantization and entropy coding for transmission; instead, it transmits the video sequence to the wireless channel directly after linear transformation and analog-like amplitude modulation. Therefore, the quality of the received video is linearly related to the channel signal-to-noise ratio (CSNR). However, due to the low compression efficiency of Softcast, the quality of the reconstructed video is worse than that of the reconstructed digital video at the same bandwidth. To solve this problem, the hybrid digital-analog (HDA) scheme was proposed in [8], [9] for 2D video transmission, which combined the advantages of digital and analog schemes by dividing the original sequence into digital and analog parts and transmitting the ''analog-digital'' superposed signal through the wireless channel after encoding and modulating them separately. In the digital coding part, the conventional digital video transmission scheme is adopted, and in the analog coding part, the residual signals between the original video and the reconstructed digital signals are transmitted using the Softcast method.
In [10], Softcast is used for 3D video transmission, which shows that the received video quality varies with the quality of the channel condition. However it still suffers from low coding efficiency. In [11], a hybrid digital-analog scheme called Swift is proposed for stereo video transmission, and this scheme designs a zigzag coding structure such that both intraview and interview correlations can be explored through prediction among the frames to be analog coded; however, it does not consider the transmission of the depth map, and view synthesis is not considered.
In 3D video systems, it is crucial to develop an accurate algorithm for the encoder to estimate the distortion of the synthesized view of the decoder, which may facilitate the design of optimized encoding schemes at the encoder to improve the quality of the reconstructed 3D video. In 2D video transmission, a recursive optimal per-pixel estimation (ROPE) [12] method was proposed to estimate the decoderside distortion from the encoder. In [13], a recursive optimal distribution estimation (RODE) approach was developed to obtain the decoder-side distortion of the synthesized virtual views in 3D video transmission. In [14], a depth-bin-based graphical model (DBGM) for view synthesis distortion estimation was developed to improve the speed of the RODE method. A recursive distortion estimation method is proposed in [15] for HDA-based single-view video transmission, which recursively estimates the decoder-side distortion from the encoder and adaptively allocates the transmission power between the digital and analog signals in HDA. Very recently, for 3D Softcast, Laplacian multipliers were used in [16] to find the closed-form expression of the distortion by transforming the power allocation problem into an optimization problem. The mean-removed block-based discrete cosine transform (DCT) and view synthesis distortion were used in [17] to remove correlations and enhance the effectiveness of energy allocation. Additionally, power allocation was performed to scale the interblock DCT coefficients. However, [16] and [17] focused on Softcast, not HDA, and there has not been any work reported on 3D-HDA.
Motivated by the aforementioned work, this paper studies 3D video transmission schemes based on the principles of HDA. The main contributions are as follows. 1) We design a video transmission framework for 3D-HDA, which allows transmission errors, such as packet losses, to occur in both the texture and depth signals. 2) We derive an expression for the recursive synthesis distortion estimation of 3D-HDA, named RSDE-3D-HDA, which can estimate the decoder-side distortion at the encoder and can be used to optimize the resource allocation for 3D-HDA. 3) We give the optimal solution to the power allocation between analog and digital signals on the 3D-HDA model. To the best of our knowledge, this is the first time HDA-based 3D video transmission is considered with power allocation.
The remaining parts of this paper are organized as follows. We first review the related work, including the frameworks of HDA and the view synthesis of 3D video in Section 2. Section 3 presents the proposed 3D-HDA framework, the RSDE-3D-HDA model and the optimal solution. The experimental results are presented in Section 4. Finally, Section 5 concludes this paper.

II. RELATED WORK A. HYBRID DIGITAL-ANALOG SCHEME
In this section, we briefly review the framework of the hybrid digital-analog scheme and the transmission process of digital and analog signals. Further details can be found in [8], [9] and [18]. Fig. 1 shows the framework of the HDA system, which consists of the digital and analog codec parts. At the digital encoder, the original video sequences are first encoded by H.264/AVC, and the resulting bitstreams are processed by forward error correction (FEC), modulation and power allocation to obtain the digital signal. At the Softcast encoder, the residual between the original sequence and the H.264 reconstructed sequence is processed using DCT, power allocation and whitening to obtain the analog signal. Finally, the digital and analog signals are superimposed and transmitted to the OFDM channel. At the decoder, the error-free demodulated digital signal is obtained first (by treating the analog signal as noise), and then the modulated signal is subtracted by the received signal to obtain the analog signal. At the Softcast decoder, the linear least squares estimator (LLSE) removes the noise, the inverse discrete cosine transform (IDCT) reconstructs the analog signal, and the digital signal is reconstructed by the H.264 decoder so that the reconstructed video sequence is obtained via these two signals. Note that the standard deviations of the DCT coefficients are particularly important for analog decoding; therefore, they are transmitted as side information to help LLSE. HDA inherits the advantages of digital and analog schemes, and can achieve improved performance. However, currently, most HDA research focuses on 2D videos, and there are few works applying HDA for the transmission of 3D videos.

B. THE VIEW SYNTHESIS SYSTEM FOR 3D VIDEOS
As shown in Fig. 2, the view synthesis of 3D video is performed by a general framework consisting of a video collector, coding, channel transmission, decoding and synthesis of the virtual view and display. The acquisition process usually involves at least two cameras recording from different angles that can capture depth maps to form a ''texture-to-depth'' group (GTP) of different views. Virtual view synthesis is the key technology in this framework; it uses decoded reference views to synthesize unknown virtual views, and the synthesis techniques are generally divided into two categories [13]: image-based rendering (IBR) and DIBR. IBR uses the disparity between different views to synthesize virtual views [19], while DIBR uses the geometric information of the depth map to generate virtual views. Therefore, DIBR can achieve better virtual viewpoint quality than IBR [20]. Generally, the DIBR's framework can be divided into two types, that is, the free-view mode and the one-dimensional parallel mode, according to the placement of the camera in the acquisition process. The former mode needs to consider disparity in multiple directions, which is complicated, whereas the latter mode, which is adopted in this paper, needs to consider only the disparity in the horizontal direction, which is relatively simple.
In the subsequent encoding process, the depth information needs to be processed, such as with subsampling and quantizing [21], [22], to obtain sparse and smooth information, and different distortion scenes need to be evaluated. If only the distortion caused by the encoding is considered, then the GTP needs to be processed by the codec, and the distortion of the view rendering is estimated based on the GTP before and after reconstruction. If only considering the distortion caused by packet loss during the transmission process, then a simulated packet loss operation needs to be performed on the compressed bitstreams and the view synthesis distortion needs to be estimated according to the GTP before and after reconstruction. Finally, according to the results of view synthesis distortion, the appropriate coding scheme is determined after reasonable rate-distortion optimization. The bitstreams are then reconstructed at the decoder after passing through the channel.
The above process represents a digital transmission method, which inevitably causes cliff effects and saturation effects. To overcome these issues, we apply HDA, which divides the GTP into two parts, i.e., a digital signal and an analog signal, and transmits the superposition of these two signals. In addition, we deduce an expression of the synthesis estimation for the 3D video, such as the power allocation factor of HDA in Fig. 1, to achieve better transmission parameter optimization, and the quality of the video received by the audience can be optimized. The reconstructed GTP will then use the DIBR from [2] to render a virtual view video. In the actual transmission scenario, because the distortion of view synthesis is quite necessary, regardless of the number of multi-view videos (MVVs) or the angle of view of the FVV [23], it is difficult to achieve a reasonable transmission cost in the development of 3D video. Thus, a better solution is to selectively transmit several reference views and then use the transmitted reference views to synthesize the final views at the receiver instead of transmitting all views. Regarding the smoothness of the 3D video in the receiver, it is important to consider the relationship between the quality of virtual views and the transmission parameters, and this factor is the main motivation of this paper.

III. THE PROPOSED 3D-HDA VIDEO TRANSMISSION SYSTEM WITH THE RSDE-3D-HDA MODEL
A. THE PROPOSED FRAMEWORK OF 3D-HDA Fig. 3 shows the overall framework of the proposed 3D-HDA system, where only two views are considered under the assumption that they are transmitted independently.
At the encoder, let T l and D l denote the original texture and depth maps of the left view, which are first encoded by the H.264/AVC digital encoder to obtain T ld and D ld , respectively. Then, D ld and T ld are cascaded and modulated to obtain the digital signal X ld . The analog part T la is from the residual texture between the original texture T l and the digital reconstructed texture at the encoder. Finally, after DCT and power allocation, the analog Y la and digital Y ld are superposed to obtain the transmission signal S lc , as expressed by Eq. (1).
where α and β are the power allocation factors for the analog and digital signals, respectively. Then, S lc is transmitted into the OFDM wireless channel with additive white Gaussian noise (AWGN), i.e., N ∼N (0, σ 2 ). At the receiver, the received superposed signal is S lc is first demodulated to obtain the digital signal Y ld by treating the analog signal as noise. The digital signal X ld is decoded by H.264 with error concealment and power allocation to obtain the recovered depth D l and the texture T ld . Then, the demodulated digital signal Y ld is modulated and subtracted from the received superposed signals S lc . After that, the reconstructed analog signal T la is obtained by Y la through power allocation and analog decoding, including inverse DCT and LLSE, which is similar to Softcast. Next, the reconstructed texture T l and depth D l are warped to obtain the virtual S ls of the left view.
For the right view, through the same operation as the left view, we can obtain its virtual S rs . Finally, S ls and S rs are used to obtain the synthesized view S s : if both left and right views are available where S ls (i, m, n) and S rs (i, m, n) denote the i-th texture pixel in the m-th block of the synthesized frame n warped from the left and right reference views, respectively. a ∈ [0, 1] is a scaling factor that depends on the location of the virtual view. Additionally, we use an interpolated pixel S I s (i, m, n) to represent the hole pixel value when there is no information provided by either the left or right view.

B. CASCADING AND SUPERPOSITION SCHEME
In this paper, we assume that the transmission bandwidth matches the encoded bit rate. Each transmitted signal is the superposition of a digital signal and an analog signal. Since the depth images represent the structure information, which is important in 3D warping, we transmit depth images in digital mode. In addition, we explore a cascading mode for digital depth and digital texture. Specifically, the total digital transmission bandwidth is divided into two parts; one part is for the digital texture and the other part is for the digital depth, as shown in Fig. 4, in which D ld , T ld and T la denote the digital signal in the depth map, the digital signal and the analog signal of the texture of the left view, respectively. Meanwhile, to easily distinguish these two types of digital signals, a small number of padding bits are added between them, while the transmission bandwidth for the padding bits is ignored for simplification. The power allocation between X ld and T la will be discussed in Sec.III.D. Note that, although the rate allocation between T ld and D ld should be optimized, experiments show that such optimization is very complicated, and the gain is limited; therefore, for simplification, similar to traditional 3D video transmission, a fixed bit ratio of 4:1 is allocated to them.

C. THE DEDUCTION OF RSDE-3D-HDA
In this section, we obtain the distortion expression for the synthesized view S s . In particular, we propose a recursive synthesis distortion estimation model for the 3D-HDA framework of Fig. 3, called RSDE-3D-HDA. The derivation is given below.
We consider only the 1D parallel view synthesis process. According to Eq. (3), S s (i, m, n) has four cases, depending on whether the left and right views are warping to the current synthesized position. Therefore, the synthesized distortion can be written as the weighted average of four cases: where p k and ED sk (i, m, n) represent the probability and the distortion of each case, respectively. Next, we find the expression of each ED sk (i, m, n) and the corresponding p k . VOLUME 8, 2020 Since both the left and right views can warp to the current position, based on Eq. (4), distortion in this scenario can be expressed as where S s (i, m, n) and S s (i, m, n) represent the i-th pixel value in the m-th block of the correctly synthesized texture frame n and the reconstructed texture frame n at the decoder when the reconstructed texture frame contains random errors.
This formula shows that the expected distortion is characterized by the known value of S s (i, m, n), S ls (i, m, n), S rs (i, m, n), and the first and second moments of S ls (i, m, n)− S ls (i, m, n) and S rs (i, m, n) − S rs (i, m, n)).
Next, we should deduce the second moments of S ls (i, m, n) − S ls (i, m, n) and S rs (i, m, n) − S rs (i, m, n). Based on 3D warping, the value of S ls (i, m, n) − S ls (i, m, n) is similar to that of T l (i, m, n) − T l (i, m, n), and because the distortions of S rs (i, m, n) and S ls (i, m, n) are the same, we take the distortion of the left view as an example: where T la (i, m, n) is known in the encoder, and the distortion of the digital signal and analog signal of the left view can be expressed as ED ld (i, m, n) and ED la (i, m, n), respectively. The expected distortion of the digital signal is For Y ld (i, m, n) in Eq. (8), we use a simple error concealment method such as that in RODE, which includes two cases: interframe and intraframe.
First, if the intraframe encoded pixel value can be received correctly at the decoder, then the reconstructed pixel value at the decoder is the same as that at the encoder. Otherwise, we use the previous frame for error concealment; that is, the pixel value of the same position in the previous frame of the correctly decoded sequence is used as the pixel value of the current position, which can be mathematically expressed as follows: where Y ld (i, m, n) and Y ld (i, m, n − 1) are the digital pixel values of the corresponding positions reconstructed by the encoder and decoder, respectively. Assuming that the bit error rate (BER) in the channel is p, the expectation of the first and second moments (k = 1, 2) of the pixel values Y ld (i, m, n) at the decoder in this case is Second, for interframe, if the decoded pixel value can be correctly received at the decoder, then the reconstructed motion-compensation residual signal of the decoder is the same as that of the encoder. Assume that the reference frame for frame n motion estimation is frame n-1, the motioncompensated residual signal of the i-th pixel of the current frame is e(i, m, n), and the corresponding reference pixel of frame n-1 is Y ld (i , l, n − 1). Otherwise, the pixel value is used at the same position in the n-1 frame of the decoder. The formula in this case is as follows: Similarly, the two moments (k = 1, 2) of the expected distortion are For the BER p in Eq. (10) and Eq. (12), since the analog signal is considered as noise to the digital signal, we find an expression for p with BPSK modulation [24].
CSNR e (i, m, n) = P Yld (i, m, n) P Yla (i, m, n) + σ 2 (13) where P Yld (i, m, n) and P Yla (i, m, n), i.e., the powers of the digital and analog signals, respectively, can be expressed as In summary, according to Eq. (8) to Eq. (14), we can recursively estimate ED ld (i, m, n) at the encoder when α and β are known.
Then, we can obtain the distortion of the analog signal according to the analog encoding process of Fig. 3. Specifically, where Y la (i, m, n) and T la (i, m, n) are the analog signal after power allocation at the encoder and the reconstruction signal at the decoder through the channel. Therefore, we can obtain According to Eq. (15) and Eq. (16), we divide the distortion of the analog signal into two parts: one caused by power allocation and the other caused by noise interference during channel transmission. For the first part, the distortion of the i-th pixel in the m-th block of the n-th frame is denoted as D * a (i, m, n); then, the expected distortion is Thus, we can obtain where σ 2 is the noise variance. According to Eqs. (8), (10), (12) and (18), we know that ED l is determined by α and β. For the right view, the distortion ED r is similar to that of ED l .
To obtain the probability of each scenario in Eq. (4), the DBGM scheme in [14] is adopted, which can efficiently capture the interaction between the pixels and the warping competition operation during view synthesis. Let V denote the warped vertices (pixels) from the left and right views; we assume that vertices V (1) to V (n b ) are connected to V s (i, m, n). According to the warping competition rule [14], the vertex with the largest depth value is chosen as the winner during the warping competition. The edge between V (j) and V s (i, m, n) is denoted as b ji . When the edge b ji is the final winner, all the edges emitted from V (z)(z = 1, . . . , j − 1, j + 1, . . . , n b ) to V s (i, m, n) with the condition should be abandoned. Thus, the winning probability of b ji can be formulated as (19) where P(b ji ) is the probability mass function of the i-th decoded pixel, which can be recursively estimated using the decoded depth and texture with the given values of α and β.
Based on Eq. (19), the probability of S s (i, m, n) being a hole can be obtained as Therefore, the probability of Scenario 1 is 2) SCENARIO 2 When only the left view can be warped to the current synthesis position, the distortion in this scenario can be expressed as It is obvious that ED s2 (i, m, n) is determined by the distortion of the left view. Similar to Eq. (21), the probability of this scenario is

3) SCENARIO 3
When only the right view can be warped to the synthesis position, similar to Eq. (22), this scenario can be expressed as The probability of this scenario is When neither the left nor the right view can be warped to the current position, a simple hole filling method is used: where W is the width of the image. The distortion and probability of this scenario are thus Thus, in Eq. (4), when the values of α and β are given, all the variables are known or can be estimated by the reconstructed pixels on the same frame or the previous frame n−1. Consequently, we can recursively estimate the value of ED s of the decoder in Eq. (4).

D. OPTIMIZED POWER ALLOCATION
In this subsection, we focus on solving the power allocation between the analog signal T la and the digital signal where ED s (i, m, n) is the distortion expressed in Eq. (4) and P Sc (i, m, n) is the total transmission power budget of the ith texture pixel in the m-th block in the n-th frame for the left view or right view, which is divided into the analog transmission power P Ya (i, m, n) and the digital signal transmission power P Yd (i, m, n). We use exhaustive search to solve the optimization formula that we derived in Eq. (29).
First, we should determine the search range of α and β; From their clear relationship seen in Eq. (29), β can be expressed in terms of α as follows: Therefore, we need the range of α, assuming α ∈ (0, α max ). From Eq. (30), we can see that the maximum of α corresponds to the case of β equal to 0, meaning that the total power is allocated to the analog signal. Specifically, where P Sc (i, m, n) and T a (i, m, n) are given constants.
Second, we should choose the search step of α. For simplicity, we uniformly compute the search step over the range of α according to the search number. Specifically, where N i is the given value of the search number. Finally, the optimal α corresponds to the minimized distortion of Eq. (29). Additionally, we can obtain the optimal β according to Eq. (30).
From the experiments, we find that frame-level power allocation is better satisfied than block-level power allocation. Therefore, our power is computed at the block, but the same power factor α is used for all blocks in the same frame level: where n i is the number of pixels in each block, which is set as 64, n m is the number of blocks, and P(i, m, n) is the power of the i-th analog signal in the m-th block in frame n.
In addition, the search complexity and the transmission overhead of the power factor are negligible because, in our experiments, the number of possible values (search range) of α for each frame is set to 32, and the overhead consists of the 5-bit index of α and the bitrates for α max , which costs 10 bits in our scheme. Above all, the proposed RSDE-3D-HDA model is the first to combine 3D video view synthesis with HDA-based power allocation, and it can accurately estimate the transmission error. According to the simulation results in the next section, we can observed that both cliff effect and saturation effect are eliminated.

IV. SIMULATION RESULTS
In this section, we evaluate the performance of our proposed method. We perform four groups of experiments measuring four metrics: 1) the average MSE mismatch ratio (AMMR) performance of the RSDE-3D-HDA, 2) the mean-square error (MSE) performances of the RSDE-3D-HDA, 3) the peak signal-to-noise ratio (PSNR) performance of the proposed 3D-HDA, and 4) the visual quality of the synthesis virtual view. The first two experimental groups are used to verify the RSDE-3D-HDA model performance. The last two experimental groups are used to verify the objective and subjective synthesis view quality performance of our 3D-HDA. Additionally, we compare our methods with some popular schemes, such as 3D-Softcast, the 3D-Digital scheme and the distortion estimation model of the RODE method in [13].
The experimental conditions are set as follows. For the digital part, because H.264/AVC is a popular method and our research focuses on the performance of proposed framework, we employed JM 19.0 of the H.264/AVC video codec. For the parameter a, we consider a virtual view in the middle of two reference views; thus, a is set to a fixed value of 1 2 . We use a single reference frame for motion estimation. The first frame is coded as the I-frame, and all remaining frames are coded as P-frames. For the analog part, we employ the approach  in [6] to process the video sequences. Notably, all the schemes use the same transmission power and bandwidth (the same resolution uses the same bandwidth).
The AMMR and PSNR are defined as where N is the number of frames in the sequence, ED est (n) is the encoder-estimated MSE for frame n, and ED sim (n) is the actual receiver-side MSE for frame n obtained by averaging over multiple experiments.
where MSE is the mean square error between the original video frames and the reconstructed frames.

A. AMMR PERFORMANCE
In this part, we first test the AMMR performance of our proposed RSDE-3D-HDA model with the testing sequences Kendo, Newspaper (1024*768@30 Hz, 100 frames), Balloons (1920*1088@30 Hz, 100 frames) and Cafe (1920*1080@30 Hz, 100 frames) [25]. The estimated distortion of our RSDE-3D-HDA scheme is applied at the encoder, while the simulated distortion is computed at the decoder. Table 1 presents the AMMR performance with CSNRs of 0 dB, 5 dB, 10 dB and 20 dB. The results show that the proposed method is quite accurate. In addition, we compare the AMMR performance with the referenced methods in [13] for the Balloons, PoznanStreet (1920*1088@30 Hz, 200 frames), Kendo, and Dancer (1024*768@30 Hz, 200 frames) [25] video sequences. For fairness, we adjust the experimental parameters to match the loss rate and group of pictures (GOP) in [13]. Table 2 presents the AMMR performance, with loss rates of 0.02, 0.05, and 0.08, the matched GOP and the total number of frames. ''Ours'' and ''RODE'' indicate the results of our method and the results from [13].   with 1/2 FEC coding is used to guarantee error-free digital transmission, and the bitstreams of the texture and depth map are cascaded. For the 3D-Softcast method, both texture and depth are encoded by Softcast, and the packet loss guarantees the same bandwidth as other methods. Additionally, the ratio between the texture and depth maps is set to 4:1 in these methods.
We can see from Fig. 6 that the digital scheme cannot overcome the cliff effect or saturation effect because of the fixed quantization parameter. Although the performance of Softcast tends to be linear, it cannot achieve an acceptable performance. The proposed 3D-HDA without optimization (meaning α = 1) can obtain linear performance, while the PSNR performance is at a lower level than that of the digital scheme when the CSNR is poor. Only the proposed 3D-HDA with the optimized power allocation (meaning α is potimized by RSDE-3D-HDA) can achieve superior performance at both low and high CSNRs and overcome the cliff effect and saturation effect perfectly.

D. VISUAL PERFORMANCES
Furthermore, we show the visual comparison of the 10th frame of the Newspaper sequence and the 15th frame of the Kendo sequence with the local enlargement of each frame. It can be seen from Fig. 7 and Fig. 8 that Softcast could not obtain acceptable performance. Our 3D-HDA method with optimization power allocation obtains the best performance in terms of visual quality, and the results are the closest to the ground truth of the synthesis view.

V. CONCLUSION
In this paper, we develop a framework for HDA-based 3D video transmission and deduce a recursive view synthesis distortion model. The experimental results show that the accuracy of the proposed distortion estimation scheme is satisfactory and that the framework outperforms state-of-theart schemes. There are three main advantages of our scheme. First, the framework can tolerate the depth and texture errors during wireless transmission. Second, the scheme can estimate the synthesis distortion of the decoder from the encoder side due to the deduced distortion model. Third, due to the exploration of HDA transmission for 3D video, the quality of the synthesized virtual view is linear with the CSNR, overcoming the cliff effect and saturation effect in digital transmission methods.