I. Introduction
Point cloud videos (PCVs), characterized by 3D unordered points with RGB color or mesh formats, facilitate an interactive and immersive experience by providing 6-degrees-of-freedom (6-DoF) movement in the metaverse [1]. Real-time streaming of volumetric PCVs imposes new challenges on the existing network infrastructure and codecs, such as exceptionally high bandwidth and substantial computational requirements for codecs. Firstly, PCVs exhibit characteristics of enormous data volumes and require ultra-high bandwidth for streaming. Taking a popular Kinect depth camera for capturing as example, it generates 2.06 gigabits (Gb) of raw data per second at a frame rate of 30 frames per second (FPS) [2]. As more cameras are incorporated to achieve finer PCV content, the demand for higher bandwidth escalates [3]. Secondly, existing codec solutions based on MPEG extensions remain focused on efficient compression and quality assurance of offline PCVs, with a lack of research into real-time PCVs [4]. This is particularly evident in the absence of studies on adaptive PCV transmission when dealing with dynamic network environments and computational heterogeneous mobile devices. Consequently, the immense computational demands of codecs and dynamic streamings hinder the delivery of a 6-DoF experience on mobile metaverse devices, such as augmented reality (AR) and virtual reality (VR) headsets.