Abstract:
Adversarial training (AT) is one of the most effective ways against adversarial attacks. However, multi-step AT is time-consuming while single-step AT is ineffective. In ...Show MoreMetadata
Abstract:
Adversarial training (AT) is one of the most effective ways against adversarial attacks. However, multi-step AT is time-consuming while single-step AT is ineffective. In this paper, we propose an Energy-AT framework to make single-step AT as effective as multi-step ones, by exploiting the two properties of energy-based models (EBM). First, we utilize the Helmholtz free energy in EBM to push generated examples to be outside of the distribution boundaries of their categories, such that they are more adversarial. Second, we apply an adaptive temperature scheme in EBM to amplify the training gradients of weak adversarial examples targetedly, such that those originally hard-to-learn examples contribute to the robustification of models also. Extensive experiments validate that Energy-AT improves the robustness of models significantly to adversarial attacks in both white-box and black-box settings, and outperforms the state-of-the-art methods.
Published in: IEEE Transactions on Emerging Topics in Computational Intelligence ( Volume: 8, Issue: 5, October 2024)