1 Introduction
Multimedia traffic has grown tremendously in mobile broadband networks, whereas video data are expected to occupy more than 70 percent of the total traffic in 2020 [1]. Particularly, high-resolution and 3D videos have received much attention recently from both research and industry communities, such as YouTube and Netflix now provide 3D videos and 3D live streaming services on mobile devices. 3D videos can be classified into two classes: single-view and multi-view. The single-view 3D video mainly targets at 3D-TV and is composed of only one depth and one texture image [2]. By using an image rendering algorithm, a single-view 3D video can be synthesized by a depth and a texture content, and can be visualized by a stereoscopic display such as 3D-TV or 3D stereo glasses. In contrast, the multi-view 3D video mainly targets free-viewpoint videos, which enables users to interactively select arbitrarily preferred views for visualizing dynamic actions from different directions [3]. Specifically, a multi-view 3D video typically offers 5, 16, and 32 different view angles [4] and is composed of multiple textures and depth images to create a video scene with rich 3D video geometry [3]. With this property, multi-view 3D videos increase users’ Quality of Experience (QoE) since they avoid generating occluded regions from a single viewpoint (i.e., single-view 3D videos). It stimulates innovative applications in mobile TV, naked-eye 3D, and virtual reality, such as multi-view live streaming with Intel True View and NextVR,
Intel: https://reurl.cc/E7KVL0, NextVR: https://nextvr.com/. Please see Appendix R, which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TMC.2020.3047714, for a detailed description for them.
where multiple closely spaced camera arrays are deployed to capture texture and depth frames of a 3D scene for NBA, WWE, and NFL sports games.