Abstract:
The increasing demand for image generation on mobile devices [1] highlights the need for high-performing image-generative models, including the diffusion model (DM) [2], ...Show MoreMetadata
Abstract:
The increasing demand for image generation on mobile devices [1] highlights the need for high-performing image-generative models, including the diffusion model (DM) [2], [3]. A conventional DM requires numerous UNet-based denoising timesteps (~50), leading to high computation and external memory access (EMA) costs. Recently, the Few-Step Diffusion Model (FSDM) [4] was introduced, as shown in Fig. 23.3.1, to reduce the denoising timesteps to 1–4 through knowledge distillation, while maintaining high image quality, reducing computations and EMA by 22.0× and 42.3×, respectively. However, prior diffusion-model architectures, which accelerated many steps of a DM [5], [6] through inter-timestep redundancy in the UNet, fail to speed up the few denoising steps of a FSDM due to the lack of redundancy between timesteps. Moreover, a multi-modal DM introduces additional computational costs for the encoder, and a FSDM shifts computational bottlenecks from the UNet to the encoder and decoder. Additionally, a FSDM becomes more sensitive to quantization due to increased precision demands with fewer denoising steps. To tackle these challenges, we exploit mixed-precision and group quantization [7] as a unified optimization scheme applicable to the encoder, UNet, and decoder in a FSDM, even without inter-timestep redundancy.
Date of Conference: 16-20 February 2025
Date Added to IEEE Xplore: 06 March 2025
ISBN Information: