Abstract:
Transformer-based language models have recently gained popularity in numerous natural language processing (NLP) applications due to their superior performance compared to...Show MoreMetadata
Abstract:
Transformer-based language models have recently gained popularity in numerous natural language processing (NLP) applications due to their superior performance compared to traditional algorithms. These models involve two execution stages: summarization and generation. The generation stage accounts for a significant portion of the total execution time due to its auto-regressive property, which necessitates considerable and repetitive off-chip accesses. Consequently, our objective is to minimize off-chip accesses during the generation stage to expedite transformer execution. To achieve the goal, we propose a token-adaptive early exit (ToEx) that generates output tokens using fewer decoders, thereby reducing off-chip accesses for loading weight parameters. Although our approach has the potential to minimize data communication, it brings two challenges: 1) inaccurate self-attention computation, and 2) significant overhead for exit decision. To overcome these challenges, we introduce a methodology that facilitates accurate self-attention by lazily performing computations for previously exited tokens. Moreover, we mitigate the overhead of exit decision by incorporating a lightweight output embedding layer. We also present a hardware design to efficiently support the proposed work. Evaluation results demonstrate that our work can reduce the number of decoders by 2.6\times on average. Accordingly, it achieves 3.2\times speedup on average compared to transformer execution without our work.
Published in: IEEE Transactions on Computers ( Volume: 73, Issue: 9, September 2024)
Funding Agency:

School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
Myeonggu Kang (Student Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2018 and 2020, respectively, where he is currently working toward the Ph.D. degree. His research interests include energy-efficient parallel processing architecture, and processing in-memory architecture for deep learning.
Myeonggu Kang (Student Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2018 and 2020, respectively, where he is currently working toward the Ph.D. degree. His research interests include energy-efficient parallel processing architecture, and processing in-memory architecture for deep learning.View more

School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
Junyoung Park (Student Member, IEEE) received the B.S. degree in electrical engineering from the Korea University, Seoul, South Korea, in 2022. He is currently working toward the M.S. degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST). His research interests include co-optimizing software/hardware for machine learning.
Junyoung Park (Student Member, IEEE) received the B.S. degree in electrical engineering from the Korea University, Seoul, South Korea, in 2022. He is currently working toward the M.S. degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST). His research interests include co-optimizing software/hardware for machine learning.View more

School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
Hyein Shin received the B.S. degree in electrical engineering from Sogang University, in 2017, and the M.S. degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), in 2019, where she is currently working toward the Ph.D. degree in electrical engineering. Her research interests include energy-efficient processing in-memory architecture and VLSI implementation of low-power ASIC.
Hyein Shin received the B.S. degree in electrical engineering from Sogang University, in 2017, and the M.S. degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), in 2019, where she is currently working toward the Ph.D. degree in electrical engineering. Her research interests include energy-efficient processing in-memory architecture and VLSI implementation of low-power ASIC.View more

School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
Jaekang Shin (Student Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from Korea Advanced Institute of Science and Technology, Daejeon, Korea, in 2018 and 2020, respectively, where he is currently working toward the Ph.D. degree in electrical engineering. His research interests include low-power deep learning accelerators, and algorithm-hardware co-optimization for computer vision.
Jaekang Shin (Student Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from Korea Advanced Institute of Science and Technology, Daejeon, Korea, in 2018 and 2020, respectively, where he is currently working toward the Ph.D. degree in electrical engineering. His research interests include low-power deep learning accelerators, and algorithm-hardware co-optimization for computer vision.View more

School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
Lee-Sup Kim (Fellow, IEEE) received the B.S. degree in electronics engineering from Seoul National University, Seoul, Korea, in 1982, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, USA, in 1986 and 1990, respectively. He was a Postdoctoral Fellow with Toshiba Corporation, Kawasaki, Japan. Since 1993, he has been with Korea Advanced Institute of Science and Technology (KAIS...Show More
Lee-Sup Kim (Fellow, IEEE) received the B.S. degree in electronics engineering from Seoul National University, Seoul, Korea, in 1982, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, USA, in 1986 and 1990, respectively. He was a Postdoctoral Fellow with Toshiba Corporation, Kawasaki, Japan. Since 1993, he has been with Korea Advanced Institute of Science and Technology (KAIS...View more

School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
Myeonggu Kang (Student Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2018 and 2020, respectively, where he is currently working toward the Ph.D. degree. His research interests include energy-efficient parallel processing architecture, and processing in-memory architecture for deep learning.
Myeonggu Kang (Student Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2018 and 2020, respectively, where he is currently working toward the Ph.D. degree. His research interests include energy-efficient parallel processing architecture, and processing in-memory architecture for deep learning.View more

School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
Junyoung Park (Student Member, IEEE) received the B.S. degree in electrical engineering from the Korea University, Seoul, South Korea, in 2022. He is currently working toward the M.S. degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST). His research interests include co-optimizing software/hardware for machine learning.
Junyoung Park (Student Member, IEEE) received the B.S. degree in electrical engineering from the Korea University, Seoul, South Korea, in 2022. He is currently working toward the M.S. degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST). His research interests include co-optimizing software/hardware for machine learning.View more

School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
Hyein Shin received the B.S. degree in electrical engineering from Sogang University, in 2017, and the M.S. degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), in 2019, where she is currently working toward the Ph.D. degree in electrical engineering. Her research interests include energy-efficient processing in-memory architecture and VLSI implementation of low-power ASIC.
Hyein Shin received the B.S. degree in electrical engineering from Sogang University, in 2017, and the M.S. degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), in 2019, where she is currently working toward the Ph.D. degree in electrical engineering. Her research interests include energy-efficient processing in-memory architecture and VLSI implementation of low-power ASIC.View more

School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
Jaekang Shin (Student Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from Korea Advanced Institute of Science and Technology, Daejeon, Korea, in 2018 and 2020, respectively, where he is currently working toward the Ph.D. degree in electrical engineering. His research interests include low-power deep learning accelerators, and algorithm-hardware co-optimization for computer vision.
Jaekang Shin (Student Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from Korea Advanced Institute of Science and Technology, Daejeon, Korea, in 2018 and 2020, respectively, where he is currently working toward the Ph.D. degree in electrical engineering. His research interests include low-power deep learning accelerators, and algorithm-hardware co-optimization for computer vision.View more

School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
Lee-Sup Kim (Fellow, IEEE) received the B.S. degree in electronics engineering from Seoul National University, Seoul, Korea, in 1982, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, USA, in 1986 and 1990, respectively. He was a Postdoctoral Fellow with Toshiba Corporation, Kawasaki, Japan. Since 1993, he has been with Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, where he is a Professor. His research interests include energy-efficient deep learning hardware and in-memory computing architectures. He has served on the technical committee of the ISSCC (IEEE International Solid-State Circuits Conference) (2004\sim∼2009) and received the Author Recognition Award from ISSCC as a contributor of 10 or more papers during 2003\sim∼2013. He was corecipient of the Best Paper Runner Up Award at the 2014 HPCA (IEEE International Symposium on High Performance Computer Architecture).
Lee-Sup Kim (Fellow, IEEE) received the B.S. degree in electronics engineering from Seoul National University, Seoul, Korea, in 1982, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, USA, in 1986 and 1990, respectively. He was a Postdoctoral Fellow with Toshiba Corporation, Kawasaki, Japan. Since 1993, he has been with Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, where he is a Professor. His research interests include energy-efficient deep learning hardware and in-memory computing architectures. He has served on the technical committee of the ISSCC (IEEE International Solid-State Circuits Conference) (2004\sim∼2009) and received the Author Recognition Award from ISSCC as a contributor of 10 or more papers during 2003\sim∼2013. He was corecipient of the Best Paper Runner Up Award at the 2014 HPCA (IEEE International Symposium on High Performance Computer Architecture).View more