Joint Power and Channel Resource Optimization in Soft Multi-View Video Delivery

Existing wireless multi-view video (MVV) transmission schemes use digital compression to achieve a better coding efficiency. However, the digital schemes suffer from the cliff effect, which refers to the phenomenon that the video quality is a step function of wireless channel quality. In this paper, we first consider a soft MVV transmission scheme where the correlations between the inter-view data and texture-depth data are exploited by a 5-dimensional discrete cosine transform (5D-DCT). The linearly transformed 5D-DCT signals are modulated in an analog manner so that the video quality gracefully improves when the channel quality becomes better. The cumbersome bit and rate controls in digital solutions are replaced by simple power controls. Second, as with the increase of the number of cameras and data depths, the data size of MVV increases linearly. To reduce the heavy data traffic in soft MVV transmission, we proposed efficient resource (bandwidth and power) allocation algorithms. Simulations results demonstrate that the proposed distortion-resource (DR) optimization algorithm can ensure a best viewing quality under a resource constraint and the proposed resource-distortion (RD) optimization algorithm can minimize the resource usage for a target video quality requirement. Third, the impact of power control across texture and depth frame and the impact of view positions on synthesized virtual view quality are investigated. The efficacy of the proposed algorithm on both the reference viewpoint as well as the virtual viewpoint is verified via simulations.


I. INTRODUCTION
Multimedia has become the most popular application for emerging network paradigms [1]- [3].Recently, Multi-view videos (MVV) are emerging in various application domains such as education, healthcare, and 3D-home entertainment.It is a fundamental technology in virtual-reality (VR), nakedeye 3D, and free-viewpoint video streaming [4], [5].Fig. 1 shows an example of the MVV transmission system, where a number of cameras are deployed at different positions.Each camera captures both texture maps (images) and depth maps (distances from the objects).These texture and depth information is known as multi-view plus depth (MVD).The MVD information is encoded via texture and depth encoding and then transmitted to the receiver via, e.g., a wireless channel.After decoding, the receiver synthesizes intermediate virtual viewpoint using depth-image-based rendering (DIBR) from The associate editor coordinating the review of this manuscript and approving it for publication was Dapeng Wu .the received MVD frames [6].As can be seen, virtual view 2 can be synthesized with the texture and depth data from its left view (i.e., view 1) and right view (i.e., view 3).The receiver can then select its favorite viewing angles and enjoy an enhanced viewing experience.
Compared with conventional single-view video streaming, an MVV streaming usually generates a large amount of video data.The video data grows linearly with the product of the number of cameras and the number of frame depths.When it comes to 4K (or 8K) videos, the heavy data traffic would consume a considerable amount of wireless resources and hence may become infeasible for realtime transmission.Currently, one solution is an independent coding on both texture and depth, which is a backward compatible extension of the H.264/AVC standard [7], [8].Each viewpoint is encoded separately and only the view corresponding to the user's current selected viewpoint is transmitted [9], [10].To further decrease the redundancy, multi-view video coding (MVC) is proposed as an extension of the H.264/MPEG-4 AVC standard.It introduces the concept of disparitycompensated prediction [11].By exploiting the inter-view dependency, MVC enables a higher compression efficiency than separate view coding [12].However, to arrive at a given level of video quality, the transmission rate still increases nearly linearly with the number of views.The large data overhead still poses a challenging problem.
Moreover, the digital video compression that current MVC adopts, relies on Shannon's separate source and channel coding approach [13].The video is encoded at the transmitter first at a specific coding rate, which is called source coding.Then adaptive modulation and channel coding is adopted to facilitate reliable bit transmission.Over the past decades, coded transmission has dominated the existing wireless video transmissions.However, there still exists several weaknesses in this framework.First of all, the quantization process involved is a lossy process and the encoded video quality depends on the coding rate at the source.Once the video is encoded, its quality will not improve any more even if the channel condition allows the transmission of a better quality video.When the channel quality degrades, the received video may be garbled due to error spread and packet loss.This is called the cliff effect [14].Moreover, the efficiency of channel coding greatly depends on the timeliness and precision of the channel feedback.In practice, the encoder adjusts the source coding rate according to the buffer size and the transmitter chooses the most appropriate modulation rate to transmit the packet.This usually requires a quite complicated bit and power allocation solution.In MVV, where a smooth navigation with 3D scenes with a minimum delay is required, timely and accurate feedback and a large buffer size are more indispensable to process and transmit the huge video data traffic.
To overcome the problems in conventional digital video transmission, soft video delivery is proposed in [15], where joint channel and source coding is exploited.By skipping quantization and entropy coding, the video frames are directly processed by a three-dimensional discrete cosine transformation (3D-DCT).Then the DCT coefficients are scaled like amplitude modulation (AM) to minimize the end-to-end distortion.Since all operations involved are linear, the pixel distortion is proportional to the noise power and there is no cliff effect.Users can gracefully improve the video quality commensurate with their wireless channel quality.
In this paper, we will incorporate the soft video delivery technique for wireless MVV transmissions.We use a five-dimensional discrete cosine transformation (5D-DCT) to jointly process video texture and depth frames from different cameras.The output is scaled and modulated in an analog manner.We investigate the complex resource control problem of soft MVV transmission in the form of two types of problems, i.e., distortion-resource (DR) optimization and resource-distortion (RD) optimization.Efficient algorithms are proposed to find an optimal solution to the formulated problems.The proposed schemes are evaluated using reference MVD videos as well as traditional single view monotone video sequences, while both the objective performance metric peak-signal-to-noise ratio (PSNR) and the perceptual performance metric structural similarity (SSIM) [16] are used for video quality assessment.
The main contributions made in this paper are summarized in the following: • To the best of our knowledge, this is one of the first works that considers a resource allocation problem for practical soft MVV wireless transmissions.The resource allocation problem is NP-hard.We proposed efficient algorithms to find the optimal solutions to the DR problem and the RD problem.The resource usage can be greatly reduced without significant video quality degradation with the proposed schemes.
• We find that there exists an interesting tradeoff between channel and power usage in MVV transmission.The impact of power allocation across texture and depth frame and the impact of view positions are investigated to achieve a high video quality in each virtual viewpoint.
• Simulation results with both MVV videos and singleview videos demonstrate that the proposed algorithm works well not only in referenced views but also in synthesized virtual views, while considerable savings in channel usage can be achieved.The remainder of this paper is organized as follows.
In Section II, we introduce the related work on conventional digital based MVV transmission and soft video transmission.In Section III, the framework of soft MVV transmission is presented.We consider practical DR optimization in Section IV and and RD optimization in Section V.Then, extensive simulations are performed to demonstrate the advantages of the proposed resource allocation algorithms in MVV transmission in Section VI.Finally, Section VII concludes the paper.

II. RELATED WORK
We divide our discussion on related work into two parts.In this section, we first provide a brief introduction of the recent work on MVV and MVD transmissions.We then review related works on soft video transmission.

A. MULTIVIEW VIDEO TRANSMISSION
MVD is a simple and effective extension of MVV by providing camera-depth information.It enables efficient depthimage based rendering so that virtual views can be generated from a limited number of source views.The texture data is captured by multiple cameras.Meanwhile, every texture is accompanied with depth information.The 3D HEVC standard [12] exploits the dependencies between texture and depth information to remove the redundancy.The encoding process is realized by spatial prediction within each frame, temporal motion-compensation between different frames, transform coding of the prediction residual, and entropy coding [17].To code the depth data, new intra coding modes, modified motion compensation, and motion vector coding are used.
Delivering MVV/MVD content over existing MVV streaming networks faces many challenges including network bandwidth variation, packet loss, delay and client view selection uncertainty [18].So far, interactive MVV streaming, 3D video coding, and practical system implementation have been studied, e.g., see recent works [4], [5], [19]- [21].However, very few works are focused on wireless MVV streaming.As introduced before, delivering 3D MVV video is a very challenging task over today's wireless networks.It requires to carry a potentially much larger data traffic generated from large number of cameras, with a strict requirement on latency on the complex wireless environments.In [22], the authors incorporated MVV with the multiple input multiple output (MIMO) technique that employed precoding and spatial multiplexing for simultaneous transmissions.A resource control algorithm was proposed to achieve unequal error protection against channel errors.Reference [23] considered MVV transmission with multiple description coding.Multiple descriptions from texture and depth data of adjacent views were transmitted through separate wireless channels, so that the multi-path diversity could be exploited for improved reliability.

B. SOFT VIDEO TRANSMISSION
The interesting concept of soft video delivery (SoftCast) was first proposed in [15], [24].Unlike traditional digital video transmission, SoftCast builds an analog code that achieves the compression-protection tradeoff with a suitable power allocation.Experiments demonstrate that the cliff effect in conventional digital video transmission can be avoided and users can enjoy a graceful video quality improvement according to the channel condition.
The SoftCast concept attracted considerable interest in the community.For example, Ref. [25] replaced the power allocation scheme in SoftCast with bit allocations, while Ref. [26], [27] combined the benefit of SoftCast and conventional digital video coding by considering an analog-digital hybrid coding scheme.In [28], the authors proposed an optimal channel and power allocation scheme under fast fading channels.The multiple antenna technique was exploited to improve the system performance.For example, Ref. [29] decomposed the MIMO channel into parallel sub-channels by MIMO precoding.By assigning high priority DCT coefficients to higher quality sub-channels, the reconstructed video quality could be optimized.Ref. [30] extended SoftCast to a wireless video multicast scenario with receiver antenna heterogeneity.The proposed scalable video multicast system allowed receivers to have a reconstructed video quality that was commensurable with the number of equipped antennas.In [31], a curve-fitting based source control algorithm was developed to find the cost distortion relationship, where cost consisted of bandwidth and transmit power, for soft video delivery.
To the best of our knowledge, Ref. [32] was the first work that investigated the soft video transmission for MVD.In this work, the metadata overhead could be greatly reduced with the proposed Gaussian Markov random filed (GMRF) model, and thus a better video quality could be achieved.However, compared with the overhead incurred by metadata, there exist a huge redundancy in the coded video data, especially in MVD transmission.How to jointly optimize the resources used, while achieving a satisfactory video quality, is still an open problem.In [33], we proposed a blind data detection method that recovered received video from the squared amplitude of received signals, which was almost metadata free.This work [33] was designed for a generic video and the AWGN channel.

III. SOFT VIDEO TRANSMISSION FOR MVV DELIVERY
In MVV delivery, the transmitter adopts multiple cameras to record the multi-color texture and depth frames.When the transmitter is notified via the feedback channel of the receiver preferred virtual viewpoint, it captures the data at several adjacent cameras near the requested virtual viewpoint.These captured data are then encoded and transmitted to the receiver.At the decoder side, the requested viewpoint is synthesized from the decoded texture and depth frames via DIBR.
In this section, we consider the case when there is plenty of bandwidth in the transmission channel and all the DCT coefficients will be transmitted to the receiver.Therefore we will focus on power allocation problem.At the encoder, a 5D-DCT is used for the entire texture and depth frames in one group of picture (GOP), which is a sequence of successive MVD video frames.After power allocation for each DCT coefficient, the DCT coefficients are then mapped to I (in-phase) and Q (quadrature-phase) components for analog wireless transmission.
Specifically, the DCT coefficients are divided into N rectangular chunks with size h × w.Let x i [j] denote the jth DCT coefficient in the ith chunk.We scale all the DCT coefficients in the ith chunk by a common scaling factor g i for noise reduction.The scaled coefficient s i [j] is denoted as follows This analog-like scaling is also called power allocation.The optimal power scaling factor is obtained by minimizing the end-to-end distortion under a constrained power budget P as follows [28].
(P0) min where E denotes the expectation, xi [j] is the estimated DCT coefficient at the receiver, N is the number of DCT chunks, n is the signal noise ratio for chunk i after power allocation, and P is the signal-to-noise ratio (SNR) for each GOP.
To obtain the optimal power allocation, we solve (P0) with the Lagrange multiplier method [15].That is, we first define a Lagrange multiplier γ > 0 and the corresponding Lagrange function L as By setting ∂L ∂ρ i = 0, i = 1, 2, . . ., N , and ∂L ∂γ = 0, we obtain the optimal solution as After demodulation, the receiver receives where n i [j] is the additive white Gaussian noise (AWGN) with a variance σ 2 n .The DCT coefficients are then extracted from the I and Q components using a linear least square estimator (LLSE) filter as The decoder then takes an inverse 5D-DCT on the DCT coefficients xi [j] to recover the video sequence.Finally, the decoder synthesizes the virtual viewpoint from the received texture and depth frames with DIBR.

IV. DISTORTION RESOURCE (DR) OPTIMIZATION
We next consider the more realistic case with limited channel resource (in the form of time slots or frequency bands).
In soft video delivery, each scaled DCT chunk is transmitted in different time slots or frequency bands.To transmit a video of a large size, e.g., MVDs, a considerable amount of channel resources is required.This makes it hard for real time delivery.Fortunately, due to the compacting nature of DCT, most of the DCT components in high spatial frequency domain tend to have very small values.Therefore, we can discard a certain amount of high-frequency DCT chunks to satisfy the channel resource constraints, while not degrading the video quality too much.It is also worth noting that even if there are sufficient channel resources, it may still be helpful to drop some DCT chunks, since the saved power (by dropping some chunks) can be utilized more efficiently by re-allocating it to other more important chunks, especially when the power constraint is stringent.Then we have a joint chunk selection and power allocation problem.

A. PROBLEM STATEMENT
Given a set of N chunks of DCT coefficients with average energy denoted by λ 1 , λ 2 , . . ., λ N .Without loss of generality, we assume λ i ≥ λ j , for all i > j.Let M be the amount of available channel resources (e.g., time or frequency slots) and P be the total power constraint for each chunk (and P be the SNR budget for each chunk).We use a binary channel allocation vector k = [k 1 , k 2 , . . ., k N ] to denote the chunk selection of each GOP: k i = 0 indicates that chunk i is discarded, and k i = 1 means that chunk i is transmitted via a channel resource slot.We aim to find the optimal channel allocation k * and the optimal power allocation ρ * = [ρ * 1 , ρ * 2 , . . ., ρ * N ], so that the total video distortion is minimized.Mathematically, the problem can be formulated as follows.
(P1) min Intuitively, since k i only takes binary values, M should be in the range of [0, N ].If M = N , Problem (P1) will be exactly the same as Problem (P0).If M < N , to minimize distortion, we will retain the largest M chunks, which we refer to as highpriority (HP) data, and discard the remaining smaller chunks, which we call low-priority (LP) data, i.e., The problem then becomes finding the optimal value M * and the optimal power allocation {ρ * i }.Similarly, by the Lagrange multiplier method, we can derive the optimal solution of ρ i as Then the MSE can be expressed as It can be seen that the total MSE in (15) can be expressed as a function of the power budget P and the channel resource constraint M .Increasing the power budget will lead to an increase of ρ i , which will help decrease the distortion.In other words, MSE is a monotone function in terms of P (or P).However, it is still unclear how to find the optimal value of M because of its discrete value and that it appears in the superscription of the summation term in (14).Such discrete nature makes it hard to obtain the optimal value M * in closedform as what we did in the case of ρ * i .

B. A GREEDY SEARCH APPROACH
To find the optimal value M * , we propose an exhaustive search based algorithm, as presented in Algorithm 1.The main idea is that, the transmitter has full knowledge of λ i 's and the total power budget P, it can find the number of chunks that minimize the MSE by searching all the possible discrete channel resources in an exhaustive manner.With the optimal chunk selection, the video is actually compressed but without too much performance degradation, and users can enjoy a better experience since the transmission time is saved and the amount of video traffic is reduced.Meanwhile, the saved channel resources can be utilized by other users in the network.In Algorithm 1, we first compute the initial energy distribution of the chunks in each GOP.Based on the information, we find the optimal channel allocation and power allocation in an exhaustive manner.By Line 8, the algorithm can actually terminate and output the optimal channel number M * , the optimal chunk selection {k * i }, and the optimal power control {ρ * i }.However, in Section VI, we observe from simulations that the Distortion-Resource (DR) curve tends to have a flat tail (e.g., see Figs. 3 and 4).This means that we are using a much larger number of channel resources to achieve only a slight improvement in PSNR.This is obviously inefficient.Hence, we introduce a control parameter 0 < α ≤ 1 in Line 10 to search for a suboptimal solution.By slightly sacrificing the PSNR performance, we can significantly reduce the video traffic and the channel resource usage.

C. COMPLEXITY ANALYSIS
In Algorithm 1, we traverse all the possible n values in a greedy manner and for each n, the involved operations are all linear.Hence the complexity of Algorithm 1 is O(N ), which is negligible.For one GOP, the main complexity comes from the energy sorting process of the N chunks, with complexity O(N log N ).However, in practice, we do not necessarily need to sort all these chunks strictly according to their energy distribution.Instead, a more feasible way would be to sort these chunks in a zigzag scanning manner, which is used in the JPEG image compression.In this way, the sorting process can be avoided and the complexity can be greatly reduced.Moreover, in simulations, we find that for consecutive video frames, the optimal value M * does not change too much.This means that we may only need to find the corresponding M * for the first GOP, and then applies this M * to the remaining GOPs, which further reduces the complexity.

V. RESOURCE DISTORTION (RD) OPTIMIZATION A. PROBLEM STATEMENT
Note that for a specific video sequence, different GOPs may have different levels of compressibility.In DR optimization, under a fixed power and channel budget, the distortion of consecutive GOPs may have large variations, which become quite annoying for viewers (although the overall PSNR could be maximized).In addition, human eyes are less sensitive to the differences of videos when the PSNR is very high.Based on these two observations, keeping the distortion relatively more stable may be a better choice.The saved power and channel resources this way can also be utilized by other users.Therefore, our problem becomes to find a good combination of chunk selection and power allocation for a target distortion.This problem is called resource distortion (RD) optimization.
Since RD optimization involves distortion, channel, and power usage, it is a three-dimensional optimization problem, which is hard to solve.However, by fixing one factor, we can decompose the difficult problem into two sub-problems.For example, we can formulate a power distortion optimization problem that aims to minimize the power resources usage under a target distortion constraint MSE and a channel usage constraint M .The problem can be stated as follows.
(P2) min Alternatively, we can formulate a channel distortion optimization problem that aims to minimize the channel resource usage under a distortion constraint MSE and a power budget P as follows.
(P3) min Since the channel usage variable M is discrete while the power scaling factor g i is continuous, these two problems belong to the class of mixed integer non-linear programming problems, which is generally NP-hard.

B. POWER DISTORTION OPTIMIZATION
To find the minimal power use in Problem (P2), we search all the feasible m ∈ [1, M ] in an exhaustive manner.For each fixed channel resource usage m, we solve the following sub-problem (P2a) min Define Lagrange multiplier µ > 0, then the Lagrange function can be written as Now we set ∂L ∂ρ i = 0, i = 1, 2, . . ., m, and ∂L ∂µ = 0, to have The corresponding power distortion optimization algorithm is presented in Algorithm 2. Note that constraint ( 23 in Problem (P2a) implies that MSE − 1 N N i=m+1 λ i > 0 for a certain m.If this condition is violated, there will be no feasible solution and we will search for the next value of m.As shown in Algorithm 2, Line 5, the objective function of Problem (P2a) will have a closed form.By comparing all such Ps, we choose the one that has the smallest value and adopt the corresponding power allocation {ρ * i }.

C. CHANNEL DISTORTION OPTIMIZATION
For channel distortion optimization, we can search the value M in a descending order.For each M , we solve the following subproblem.
(P3a) min If the objective value of Problem (P3a) is less than MSE, then corresponding M is feasible.Then we decrease the value of M by 1 and solve Problem (P3a) again, until we find an M that is infeasible.For each fixed M , Problem (P3a) is a simple convex optimization problem.By the Lagrange multiplier method, we obtain the optimal solution as We present the procedure in Algorithm 3. Note that Algorithms 1-3 are all greedy search algorithms and they share similar procedures.Thus the complexity analysis for Algorithm 1 also applies to Algorithm 2 and Algorithm 3.

Algorithm 3 Channel Distortion Optimization Algorithm
Calculate ρ i,m according to (29);

D. POWER AND CHANNEL USAGE TRADEOFF
For a target video quality, we can either optimize the power consumption under a given channel usage constraint or optimize the channel usage under a constrained power budget.Therefore, we can obtain a power and channel usage tradeoff curve.On each point of the curve, the combination of the corresponding channel and power usage achieves the same target video quality.This tradeoff curve provides us a useful guideline for choosing a suitable power and channel usage pair based on practical resource constraints.In multi-user system, different viewers may have diverse power and channel resource budgets, where a joint optimization can be applied to save resource consumption.

E. VIEW SYNTHESIS
In MVV transmissions, the texture video data contains detailed video content information, while the depth data plays important roles in view synthesis.The quality of both the texture and depth frames determines the virtual view quality.In digital MVV transmissions, bit allocation and power assignment are performed to ensure a good virtual view synthesis performance, which is usually quite complicated [12], [34].In [34], the view synthesis optimization algorithm is integrated into the encoding process to enable rate-distortion optimization.To achieve a bit rate distribution balance between the texture data and the depth data, a complex combinatorial optimization problem has to be solved.For a two-view scenario, the video/depth rate distribution can be 86%/14%.This way, the depth data is encoded at a low cost.Similar to the bit rate distribution in digital transmissions, in soft video transmissions, we will investigate the power allocation across the texture and depth video data.
Suppose there are N views in the system, as shown in Fig. 2. For simplicity, in this work, we consider the case where equal power control among different reference views is assumed.Power allocation between texture data and depth data is investigated.Specifically, before the 5D-DCT operation, each texture view is scaled by a common factor of β/N and each depth view is scaled by a factor of (1 − β)/N .After 5D-DCT, all the video data are linearly transformed and modulated in an analog manner.At the decoder, an inverse process is performed.Followed by a digital renderer to generate the user's preferred virtual views.Compared with the complicated bit allocation in digital video transmissions, the proposed soft video framework simplifies the process into a power allocation problem.Our target becomes to investigate the impact of the scaling factor β on the quality of the synthesized virtual view.We will provide our study and discussion on parameter β in Section VI-D1.

VI. SIMULATION STUDY A. PARAMETER SETTING 1) PERFORMANCE METRIC
In our simulation study, we use both the objective performance metric PSNR and the perceptual metric SSIM [16] for video quality assessment.PSNR is defined (in dB) as where B is the number of bits used to encode pixel luminance (usually 8 bits) and MSE is the mean squared error between all the pixels between the decoded and the original video.In soft video delivery, since DCT is a linear transformation, the MSE stays the same after the transformation.Hence, we substitute ( 15) into ( 30) and we get the corresponding distortion.Generally, improvements of PSNR of magnitude larger than 0.5dB are visually noticeable.
A PSNR below 20dB is considered not acceptable.We also use SSIM to measure the similarity of the original and reconstructed images to test the performance of the proposed method [16].For two N × N images x and y, the SSIM index is computed as [16] SSIM(x, y) = (2µ where µ x and µ y are the means of x and y, respectively; σ 2 x and σ 2 y are the variances of x and y, respectively; σ xy is the covariance of x and y; c L is the dynamic range of the pixel values; k 1 = 0.01, and k 2 = 0.03.An SSIM value closer to 1 suggests higher perceptual similarity between the original and the decoded image.

2) TEST VIDEO
We use two standard reference MVD videos, balloons and kendo, at 30 fps.We choose view points 1, 3, 5. Three cameras are used with a distance 10cm away.Their resolution is 1024 × 768 pixels for texture and depth frames at a frame rate of 20 fps.The video sequences are selected from the video database [35].In addition, some standard single view monotone CIF video sequences from video database [36] are used in our simulations.

3) PARAMETER SETTING
For soft video delivery, we set the GOP size to 4. In existing chunk-based schemes of soft video delivery, we divide each frame into 8 × 8 = 64 chunks.For MVD video, we read both the texture and depth data from the three cameras.Thus one GOP consists of 3 × 2 × 4 × 64 = 1536 chunks.The camera configuration is summarized in Table 1.For Algorithm 1, we choose α = 0.98.We use the 3D HEVC test model (HTM) v15.0 software [37] renderer to synthesize a virtual viewpoint from the received texture and depth frames.We assume the AWGN channel in the simulations.

B. DR PERFORMANCE
We first evaluate the DR optimization performance.We investigate the maximum PSNR that can be achieved under a given resource (i.e., channel and power) constraint.As mentioned before, there will be 1536 chunks in each GOP.Suppose in each channel slot (e.g., time or frequency), only one chunk can be transmitted and let the maximum number of available  channel slots M for each GOP be 1536.We fix the noise variance to be 1 and vary the total transmit power budget P for each GOP. 1 The DR optimization results for video sequences kendo.yuvand balloons.yuvare shown in Fig. 3 and Fig. 4, respectively.
It can be seen that for a given channel resource use N , the PSNR generally increases with the power budget P.This is also confirmed by our discussion of (15).However, for a given power budget P, the PSNR does not necessarily increase with the channel usage, especially when the power budget is low.For example, when the power budget P is 5dB, the maximum PSNR point for the video sequences kendo.yuvand balloons.yuvare attained when N = 250 and N = 196, respectively.A maximum channel usage of 1536 does not always lead to the highest PSNR.The reason is in soft video delivery, different chunks are not of equal importance although each of them consumes one channel use.Under a fixed power budget, allocating more power to HP chunks and allocating less power to LP chunks (or even discarding them) helps improve the PSNR performance.Finally, we note that these DR curves generally have a flat tail, which means when the PSNR is above a certain level, improving the channel use is not efficient as improving the power budget.For example, for video sequence kendo.yuv,when the power budget P = 10dB and the channel use N = 668, the PSNR is 40.74dB.Now improving channel usages doesn't improve PSNR value any more.However, a power increase from 10dB to 15 dB brings a PSNR improvement of 4.52dB and a power increase from 10dB to 15dB brings a PSNR improvement of 8.28dB.
To further clarify the channel usage saving, we plot the corresponding performance for video sequence kendo.yuv in Fig. 5.As can be seen, conventional Softcast uses all the channel and power resources to achieve a good PSNR.However, with the proposed algorithm, we can achieve a slightly higher PSNR value with a reduced channel usage.For example, in Fig. 5 (d) when the power budget P = 5dB, in conventional SoftCast, the channel usage is 1536 and the achieved PSNR is 35.903dB.However, with the proposed algorithm, we achieve the maximum PSNR of 36.236dB with only 250/1536 = 16.2% of the original channel use.Moreover, due to the flat tail of the distortion resource curve, by slightly lowering the PSNR requirement (e.g., α = 98% of maximum PSNR), the channel usage can be further greatly reduced from 250 to 70, as shown in Fig. 5 (d).
Comparing these figures, we also note that the proposed algorithm saves more channel usage when P is low.Hence the proposed method is more suitable for the case when channel condition is not good or the total power budget is limited.
Table 2 lists the channel usage comparison for different video sequences.The video file names in bold are MVD videos while others are standard monotone CIF video sequences used in SoftCast test, with a resolution of 352×288 at a frame rate of 20 fps.The proposed method significantly reduces the channel usage while still maintaining a satisfactory performance.
Note that the proposed method not only applies to MVV videos but also conventional standard videos.Considering that the channel usage grows linearly with the number of depth and camera data for MVV video transmissions, our proposed method is naturally more suitable to deal with the heavy data burden challenge caused by MVV videos.

C. RD OPTIMIZATION PERFORMANCE 1) POWER DISTORTION OPTIMIZATION PERFORMANCE
Suppose the user requires a PSNR of 35dB, under a constant channel usage of 1536 chunks per GOP.We plot the power allocation curve for consecutive MVV video sequences kendo.yuv in Fig. 6 and that for balloons.yuv in Fig. 7. Specifically, we plot the resource usage curve for the first 64 GOPs in the figures.It can be seen that the power allocation of consecutive GOPs have very small variations and the channel usage is maintained at a relatively low level.This intricate resource control helps the MVV video quality remain at the prescribed PSNR value of 35dB.Hence the viewer can enjoy a favorable viewing experience and the saved wireless resources can be utilized by other users.

2) CHANNEL DISTORTION OPTIMIZATION PERFORMANCE
Similarly, still suppose the user requires a PSNR of 35dB under a constant transmit power budget P = 10dB.We plot the channel allocation curve for different videos in Fig. 8 and Fig. 9.We note that the channel usage fluctuates at a relatively low and stable level compared with the large chunk number (1536) in one GOP.Hence, in practice, we can actually progressively allocate slightly more channel resources (say 100) without running algorithm 3 many times.The computational cost, therefore, can be further reduced.

3) CHANNEL POWER TRADEOFF
For a target video quality, there exists a tradeoff between power usage and channel usage.We plot the tradeoff curve     required are quite low.Moreover, these curves tend to have a very sharp turning point when M is relatively low.Hence, maintaining the power and channel usage pair near the turning point would be an efficient strategy.This observation has also been confirmed by the previous simulations.

D. VIEW SYNTHESIS 1) IMPACT OF PARAMETER β
As mentioned before, the power scaling factor β determines the power allocation ratio between the texture data and the depth data, which has a joint impact on the quality of synthesized virtual view.For each view, the power ratio between the texture data and the depth is β/(1 − β).β = 0.5 means an equal power control between texture data and depth data and β > 0.5 means more power is allocated to the texture data.We plot the impact of parameter β on the quality of the synthesized virtual view in Fig. 12.
The test video is kendo.yuv.We investigate the PSNR performance for the first frame in the first GOP of view 1, view 3, the corresponding virtual views.The view is synthesized based on the from 1 3. Recall that 5D-DCT linear transformation is performed on the texture and depth data from relevant views.Hence the MSE distortion of each component data and data from each view) can be actually approximated by the average MSE of its corresponding GOP.This explains why the texture and depth data quality curve from view 1 crossed the curve from view 3 when β = 0.5.If we increase the value of β, we intend to allocate more power to texture data, hence the PSNR of the texture data increases and the PSNR of the depth data decreases.Remember that in this paper we assume power is equally allocated between different view points, hence view 1 and view 3 is of equal importance in viewing synthesis.That is why the PSNR quality curve of view 1 fits well with that of view 3.In real systems, there VOLUME 7, 2019 may be 50 or even more cameras, while a viewer may only be interested in one specific view.An equal power allocation for each view is obviously not optimal.Instead of performing an equal power control among different cameras, it may be a better strategy to allocate more power to the adjacent views of the user's chosen one.This problem will be addressed in our future work.Finally, we note that the synthesized virtual view 2 reaches its best quality around β = 0.5.Moreover, we note that when β is between 0.4 to 0.6, the virtual view quality remains at a relatively high level.As β gets closer to either end of the interval [0, 1], the synthesized virtual view quality drops dramatically.This is because when β is small (or large), there is a huge distortion in the texture (or the depth) data.This kind of imbalance between the texture data and depth data degrades the quality of the synthesized view.From this figure, we can see a good choice of β is between 0.5 and 0.6 where the texture view qualities at view 1 and view 3 are slightly increased and the view quality at virtual view 2 almost remains at a constant high level.

2) IMPACT OF VIRTUAL VIEW POSITIONS
In this experiment, view 3 and view 5 are reference views and the views between them are synthesized virtual views.Fig. 13 shows the impact of the virtual view position on the video perceptual quality.We the first frame of video sequence kendo.yuv.Video power is equally distributed between the texture and depth data (β 0.5).We change the SNR value from 0dB to Both the PSNR and be seen, with the of both PSNR and When SNR is and 10dB, the view performance at the mid-point (virtual view 4) tends to a higher perceptual quality the view quality at other positions.This is due to the equal power allocation between adjacent views.Hence, if a user is more interested in the virtual view that is closer to view 3 (e.g., virtual view at position 3.2), we may want to allocated more power to the view at position 3.When SNR is 20dB, there is generally no big difference between the video quality data at different positions.

3) PERFORMANCE THE PROPOSED ALGORITHM
In this subsection, we test the performance of our proposed DR algorithm on the quality of synthesized virtual views.Each simulation is performed 10 times and we present the averaged results.
Fig. 14 presents the synthesis quality at view point 2 for different algorithms, along with the synthesized frames.Under a channel constraint 1536, the conventional SoftCast scheme utilizes all the channel resources and allocate power on each chunk.In contrast, our proposed DR algorithm discard LP chunks and only retain the HP chunks.The channel usage is greatly saved and the PSNR performance is improved.For example, when power usage is 10dB, for kendo.yuv,both the PSNR and SSIM slightly increases and the channel usage is only 163/1536 = 10.6% of the conventional SoftCast scheme.The proposed algorithm achieves an even better performance when SNR is low.Considering the huge data traffic in MVV, the saved wireless resources would be considerable.Finally, we have to mention that although the proposed algorithm is designed based on the PSNR metric, it still works well in terms of the SSIM metric.For video sequence balloons.yuv,the proposed algorithm improves the SSIM from 0.84184 to 0.91023 when SNR is 0dB.We also perform similar experiments on the RD algorithms, where similar observations are made.For space limitation, we omit the RD results here.

VII. CONCLUSION
In this paper, we integrated applying soft video delivery for MVV transmissions.Compared with the conventional digital based solutions, the proposed scheme improves video quality gracefully and the cliff effect can be avoided.Furthermore, complex bit allocation and rate control in digital systems can be replaced by a simple power allocation scheme.To handle the heavy data traffic caused by MVV, we proposed a resource control algorithm.The proposed DR optimization algorithm achieves the best viewing quality under a resource constraint.The proposed RD optimization algorithm minimizes the resource usage for a target video quality requirement.Hence the viewer can enjoy a stable viewing quality, which is favorable for MVV video streaming.With the proposed scheme, we also investigated the impact of the power control across the texture and depth data and view positions on synthesized virtual view quality.Simulation results demonstrated that the proposed algorithm worked well not only for referenced views but also for synthesized virtual views.Despite all these merits brought by soft video delivery, we find that soft video transmission may not be resilient to packet loss and is prone to channel variations.An extension toward hybrid digital and analog video coding would be a promising direction to fully exploit the benefit of both the digital and analog video coding.

FIGURE 1 .
FIGURE 1. Illustration of a multi-view video transmission system.

FIGURE 2 .
FIGURE 2. Diagram of the proposed soft multi-view video delivery system.

FIGURE 4 .
FIGURE 4. DR optimization performance for MVD video sequence balloons.yuv.

FIGURE 9 .
FIGURE 9. Channel control for video sequence balloons.yuv.

FIGURE 10 . 11 .
FIGURE 10.Power and channel usage tradeoff curve for video sequence kendo.yuv.

FIGURE 13 .
FIGURE 13.Impact of virtual view positions.

FIGURE 14 .
FIGURE 14. Synthesize quality at virtual view point 2 for different algorithms.

TABLE 2 .
Channel usage comparison for different video sequences.Power for MVV video sequence kendo.yuv.Power allocation for MVV video sequence balloons.yuv.