Skip to Main Content
In interactive multiview video streaming (IMVS), a client receives and observes one of many available viewpoints of the same scene and periodically requests from the server view switches to neighboring views, as the video is played back in time uninterruptedly. One key technical challenge is to design a frame coding structure that facilitates periodic view switching and achieves an optimal tradeoff between storage cost and expected transmission rate. In this paper, we first propose three significant improvements over existing IMVS systems and then study the corresponding frame structure optimization. First, using depth-image-based rendering, the new IMVS system enables free viewpoint switching, i.e., by encoding and transmitting both texture and depth maps of captured views, a client can select and synthesize any virtual view from an almost continuum of viewpoints between the left-most and right-most captured views. Second, the IMVS system adopts a more realistic Markovian view-switching model with memory that more accurately captures user behaviors than previous memoryless models . A view-switching model is used in predicting client's future view-switching patterns. Third, assuming that the round-trip-time (RTT) delay during server-client communication is nonnegligible, during an IMVS session, the IMVS system additionally transmits redundant frames RTT into future playback, so that zero-delay view switching can be achieved. Given these improvements, we formalize a new joint optimization of the frame coding structure, transmission schedule, and quantization parameters of the texture and depth maps of multiple camera views. We propose an iterative algorithm to achieve fast and near-optimal solutions. The convergence of the algorithm is also demonstrated. Experimental results show that the proposed optimized rate-allocation method requires 38% lower transmission rate than the fixed rate-allocation scheme. In addition, with the same storage, the transmission rate of - he optimized frame structure can be up to 55% lower than that of an I-frame-only structure and 27% lower than that of the structure without distributed source coding frames.