I. Introduction
The continuous development of deep learning techniques and the widespread dissemination of multi-media have brought about a situation where seeing is no longer believing. With advancements in generative models represented by Variational Autoencoders (VAEs) [1], Generative Adversarial Networks (GANs) [2], [3] and Diffusion Models (DMs) [4], [5], it has become relatively straightforward to alter one person's face to another while maintaining the original facial expression and head pose easily. However, these forgery techniques [6], [7], [8] are likely to be misused for certain malicious purposes, resulting in serious security and ethical issues (e.g., the promotion of celebrity pornography and political persecution). Therefore, in order to mitigate the negative impact on public safety and personal privacy, it is crucial to develop effective solutions to counteract these face forgery attacks.