1. INTRODUCTION
From dynamic 3D scenes recorded by multiple cameras a scene representation with 3D video objects (3DVOs) can be reconstructed, using synthetic geometry and real video texture sequences from the original cameras [1]. The synthetic geometry of such 3DVOs is often approximated by planar 3D meshes for every time instance. The single 3D meshes are further transformed into time-consistent animated mesh sequences. The mesh sequences consist of an initial intra or I mesh as well as a number of predictively coded or P meshes. The I mesh contains the initial 3D vertex positions as well as the connectivity to define the mesh faces. Each P mesh only contains the new vertex positions and uses the connectivity from the I-mesh.