Abstract:
In this paper, energy saving based on transformers with LeakyReLU attention mechanisms is discussed. Softmax functions in attention mechanisms of transformers are replace...Show MoreMetadata
Abstract:
In this paper, energy saving based on transformers with LeakyReLU attention mechanisms is discussed. Softmax functions in attention mechanisms of transformers are replaced by LeakyReLU functions, which include ReLU functions as special cases. The goal of doing so is to explore possible trans-former architectures with reduced computational complexity for saving electrical energy in the inference phase. Theoretical analysis based on a general-purpose computing model shows that, under given conditions and assumptions, the worst-case time complexities for computing attention in transformers are in the rank, from low to high complexity, of ReLU, LeakyReLU, and softmax. In particular, as shown in experimental results on language translation and deterministic network flow aggregation tasks, transformers with ReLU (LeakyReLU with 0 negative slope) and LeakyReLU (0.1 negative slope) attention consume less average computation time compared to that with softmax attention in inference phases. The theoretical and experimental results show that the transformers with LeakyReLU activation may save energy in the language translation and deterministic networking tasks.
Date of Conference: 08-14 December 2023
Date Added to IEEE Xplore: 29 December 2023
ISBN Information: