Abstract:
We propose a multimodal 3D scene editing framework MMEditor to create or modify objects within an extant 3D Gaussian Splatting (3DGS) according to text and image prompts....Show MoreMetadata
Abstract:
We propose a multimodal 3D scene editing framework MMEditor to create or modify objects within an extant 3D Gaussian Splatting (3DGS) according to text and image prompts. MMEditor employs a multimodal image editing module to iteratively optimize 3D Gaussians in editing regions for delicate and multi-view consistent 3D editing. The key multimodal image editing module can perform editing with accurate appearance and location control, which is achieved by two designs. First, a multimodel adapter block takes the reference image as a foreign language to augment the text prompt, enabling editing results to align with the generic text description and the unique characteristics in the reference image. Second, an attention-based localization block localizes cross-attention with user-defined 3D bounding boxes, thereby ensuring the editing occurs in editing regions. Experiments demonstrate that our method achieves more accurate and controllable results than previous state-of-the-art methods.
Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information: