Loading web-font TeX/Main/Regular
ToEx: Accelerating Generation Stage of Transformer-Based Language Models via Token-Adaptive Early Exit | IEEE Journals & Magazine | IEEE Xplore

ToEx: Accelerating Generation Stage of Transformer-Based Language Models via Token-Adaptive Early Exit


Abstract:

Transformer-based language models have recently gained popularity in numerous natural language processing (NLP) applications due to their superior performance compared to...Show More

Abstract:

Transformer-based language models have recently gained popularity in numerous natural language processing (NLP) applications due to their superior performance compared to traditional algorithms. These models involve two execution stages: summarization and generation. The generation stage accounts for a significant portion of the total execution time due to its auto-regressive property, which necessitates considerable and repetitive off-chip accesses. Consequently, our objective is to minimize off-chip accesses during the generation stage to expedite transformer execution. To achieve the goal, we propose a token-adaptive early exit (ToEx) that generates output tokens using fewer decoders, thereby reducing off-chip accesses for loading weight parameters. Although our approach has the potential to minimize data communication, it brings two challenges: 1) inaccurate self-attention computation, and 2) significant overhead for exit decision. To overcome these challenges, we introduce a methodology that facilitates accurate self-attention by lazily performing computations for previously exited tokens. Moreover, we mitigate the overhead of exit decision by incorporating a lightweight output embedding layer. We also present a hardware design to efficiently support the proposed work. Evaluation results demonstrate that our work can reduce the number of decoders by 2.6\times on average. Accordingly, it achieves 3.2\times speedup on average compared to transformer execution without our work.
Published in: IEEE Transactions on Computers ( Volume: 73, Issue: 9, September 2024)
Page(s): 2248 - 2261
Date of Publication: 21 May 2024

ISSN Information:

Funding Agency:

Author image of Myeonggu Kang
School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
Myeonggu Kang (Student Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2018 and 2020, respectively, where he is currently working toward the Ph.D. degree. His research interests include energy-efficient parallel processing architecture, and processing in-memory architecture for deep learning.
Myeonggu Kang (Student Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2018 and 2020, respectively, where he is currently working toward the Ph.D. degree. His research interests include energy-efficient parallel processing architecture, and processing in-memory architecture for deep learning.View more
Author image of Junyoung Park
School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
Junyoung Park (Student Member, IEEE) received the B.S. degree in electrical engineering from the Korea University, Seoul, South Korea, in 2022. He is currently working toward the M.S. degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST). His research interests include co-optimizing software/hardware for machine learning.
Junyoung Park (Student Member, IEEE) received the B.S. degree in electrical engineering from the Korea University, Seoul, South Korea, in 2022. He is currently working toward the M.S. degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST). His research interests include co-optimizing software/hardware for machine learning.View more
Author image of Hyein Shin
School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
Hyein Shin received the B.S. degree in electrical engineering from Sogang University, in 2017, and the M.S. degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), in 2019, where she is currently working toward the Ph.D. degree in electrical engineering. Her research interests include energy-efficient processing in-memory architecture and VLSI implementation of low-power ASIC.
Hyein Shin received the B.S. degree in electrical engineering from Sogang University, in 2017, and the M.S. degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), in 2019, where she is currently working toward the Ph.D. degree in electrical engineering. Her research interests include energy-efficient processing in-memory architecture and VLSI implementation of low-power ASIC.View more
Author image of Jaekang Shin
School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
Jaekang Shin (Student Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from Korea Advanced Institute of Science and Technology, Daejeon, Korea, in 2018 and 2020, respectively, where he is currently working toward the Ph.D. degree in electrical engineering. His research interests include low-power deep learning accelerators, and algorithm-hardware co-optimization for computer vision.
Jaekang Shin (Student Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from Korea Advanced Institute of Science and Technology, Daejeon, Korea, in 2018 and 2020, respectively, where he is currently working toward the Ph.D. degree in electrical engineering. His research interests include low-power deep learning accelerators, and algorithm-hardware co-optimization for computer vision.View more
Author image of Lee-Sup Kim
School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
Lee-Sup Kim (Fellow, IEEE) received the B.S. degree in electronics engineering from Seoul National University, Seoul, Korea, in 1982, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, USA, in 1986 and 1990, respectively. He was a Postdoctoral Fellow with Toshiba Corporation, Kawasaki, Japan. Since 1993, he has been with Korea Advanced Institute of Science and Technology (KAIS...Show More
Lee-Sup Kim (Fellow, IEEE) received the B.S. degree in electronics engineering from Seoul National University, Seoul, Korea, in 1982, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, USA, in 1986 and 1990, respectively. He was a Postdoctoral Fellow with Toshiba Corporation, Kawasaki, Japan. Since 1993, he has been with Korea Advanced Institute of Science and Technology (KAIS...View more

Author image of Myeonggu Kang
School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
Myeonggu Kang (Student Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2018 and 2020, respectively, where he is currently working toward the Ph.D. degree. His research interests include energy-efficient parallel processing architecture, and processing in-memory architecture for deep learning.
Myeonggu Kang (Student Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2018 and 2020, respectively, where he is currently working toward the Ph.D. degree. His research interests include energy-efficient parallel processing architecture, and processing in-memory architecture for deep learning.View more
Author image of Junyoung Park
School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
Junyoung Park (Student Member, IEEE) received the B.S. degree in electrical engineering from the Korea University, Seoul, South Korea, in 2022. He is currently working toward the M.S. degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST). His research interests include co-optimizing software/hardware for machine learning.
Junyoung Park (Student Member, IEEE) received the B.S. degree in electrical engineering from the Korea University, Seoul, South Korea, in 2022. He is currently working toward the M.S. degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST). His research interests include co-optimizing software/hardware for machine learning.View more
Author image of Hyein Shin
School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
Hyein Shin received the B.S. degree in electrical engineering from Sogang University, in 2017, and the M.S. degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), in 2019, where she is currently working toward the Ph.D. degree in electrical engineering. Her research interests include energy-efficient processing in-memory architecture and VLSI implementation of low-power ASIC.
Hyein Shin received the B.S. degree in electrical engineering from Sogang University, in 2017, and the M.S. degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), in 2019, where she is currently working toward the Ph.D. degree in electrical engineering. Her research interests include energy-efficient processing in-memory architecture and VLSI implementation of low-power ASIC.View more
Author image of Jaekang Shin
School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
Jaekang Shin (Student Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from Korea Advanced Institute of Science and Technology, Daejeon, Korea, in 2018 and 2020, respectively, where he is currently working toward the Ph.D. degree in electrical engineering. His research interests include low-power deep learning accelerators, and algorithm-hardware co-optimization for computer vision.
Jaekang Shin (Student Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from Korea Advanced Institute of Science and Technology, Daejeon, Korea, in 2018 and 2020, respectively, where he is currently working toward the Ph.D. degree in electrical engineering. His research interests include low-power deep learning accelerators, and algorithm-hardware co-optimization for computer vision.View more
Author image of Lee-Sup Kim
School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
Lee-Sup Kim (Fellow, IEEE) received the B.S. degree in electronics engineering from Seoul National University, Seoul, Korea, in 1982, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, USA, in 1986 and 1990, respectively. He was a Postdoctoral Fellow with Toshiba Corporation, Kawasaki, Japan. Since 1993, he has been with Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, where he is a Professor. His research interests include energy-efficient deep learning hardware and in-memory computing architectures. He has served on the technical committee of the ISSCC (IEEE International Solid-State Circuits Conference) (2004\sim∼2009) and received the Author Recognition Award from ISSCC as a contributor of 10 or more papers during 2003\sim∼2013. He was corecipient of the Best Paper Runner Up Award at the 2014 HPCA (IEEE International Symposium on High Performance Computer Architecture).
Lee-Sup Kim (Fellow, IEEE) received the B.S. degree in electronics engineering from Seoul National University, Seoul, Korea, in 1982, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, USA, in 1986 and 1990, respectively. He was a Postdoctoral Fellow with Toshiba Corporation, Kawasaki, Japan. Since 1993, he has been with Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, where he is a Professor. His research interests include energy-efficient deep learning hardware and in-memory computing architectures. He has served on the technical committee of the ISSCC (IEEE International Solid-State Circuits Conference) (2004\sim∼2009) and received the Author Recognition Award from ISSCC as a contributor of 10 or more papers during 2003\sim∼2013. He was corecipient of the Best Paper Runner Up Award at the 2014 HPCA (IEEE International Symposium on High Performance Computer Architecture).View more
Contact IEEE to Subscribe

References

References is not available for this document.