Abstract:
Hardware accelerators such as GPUs are required for real-time, low latency inference with Deep Neural Networks (DNN). Providing inference services in the cloud can be res...Show MoreMetadata
Abstract:
Hardware accelerators such as GPUs are required for real-time, low latency inference with Deep Neural Networks (DNN). Providing inference services in the cloud can be resource intensive, and effectively utilizing accelerators in the cloud is important. Spatial multiplexing of the GPU, while limiting the GPU resources (GPU%) to each DNN to the right amount, leads to higher GPU utilization and higher inference throughput. Right-sizing the GPU for each DNN the optimal batching of requests to balance throughput and service level objectives (SLOs), and maximizing throughput by appropriately scheduling DNNs are still significant challenges.This article introduces a dynamic and fair spatio-temporal scheduler (D-STACK) for multiple DNNs to run in the GPU concurrently. We develop and validate a model that estimates the parallelism each DNN can utilize and a lightweight optimization formulation to find an efficient batch size for each DNN. Our holistic inference framework provides high throughput while meeting application SLOs. We compare D-STACK with other GPU multiplexing and scheduling methods (e.g., NVIDIA Triton, Clipper, Nexus), using popular DNN models. Our controlled experiments with multiplexing several popular DNN models achieve up to 1.6\times improvement in GPU utilization and up to 4\times improvement in inference throughput.
Published in: IEEE Transactions on Cloud Computing ( Volume: 12, Issue: 4, Oct.-Dec. 2024)
Funding Agency:

University of California, Riverside, Riverside, CA, USA
Aditya Dhakal (Member, IEEE) received the BE degree from the Kyushu Institute of Technology, Japan with the Japanese Ministry of Education scholarship, the MS degree from the University of Connecticut, and the PhD degree from the University of California, Riverside. He is a research scientist with Hewlett Packard Labs, Milpitas, California. His area of research included hardware multiplexing and neural network inference. ...Show More
Aditya Dhakal (Member, IEEE) received the BE degree from the Kyushu Institute of Technology, Japan with the Japanese Ministry of Education scholarship, the MS degree from the University of Connecticut, and the PhD degree from the University of California, Riverside. He is a research scientist with Hewlett Packard Labs, Milpitas, California. His area of research included hardware multiplexing and neural network inference. ...View more

IIT Gandhinagar, Gujarat, India
Sameer G. Kulkarni received the PhD degree from the University of Göttingen, Germany. He worked as a postdoctoral researcher with the University of California at Riverside, Riverside. He is currently an assistant professor with the Department of Computer Science and Engineering, and Electrical Engineering, Indian Institute of Technology Gandhinagar. His current research interests include parallel and distributed computing...Show More
Sameer G. Kulkarni received the PhD degree from the University of Göttingen, Germany. He worked as a postdoctoral researcher with the University of California at Riverside, Riverside. He is currently an assistant professor with the Department of Computer Science and Engineering, and Electrical Engineering, Indian Institute of Technology Gandhinagar. His current research interests include parallel and distributed computing...View more

University of California, Riverside, Riverside, CA, USA
K. K. Ramakrishnan (Life Fellow, IEEE) received the MTech degree from the Indian Institute of Science, in 1978, and the MS and PhD degree in computer science from the University of Maryland, College Park, in 1981 and 1983, respectively. He is a distinguished professor of computer science and engineering with the University of California, Riverside. He joined AT&T Bell Labs, in 1994 and was with AT&T Labs-Research from its...Show More
K. K. Ramakrishnan (Life Fellow, IEEE) received the MTech degree from the Indian Institute of Science, in 1978, and the MS and PhD degree in computer science from the University of Maryland, College Park, in 1981 and 1983, respectively. He is a distinguished professor of computer science and engineering with the University of California, Riverside. He joined AT&T Bell Labs, in 1994 and was with AT&T Labs-Research from its...View more

University of California, Riverside, Riverside, CA, USA
Aditya Dhakal (Member, IEEE) received the BE degree from the Kyushu Institute of Technology, Japan with the Japanese Ministry of Education scholarship, the MS degree from the University of Connecticut, and the PhD degree from the University of California, Riverside. He is a research scientist with Hewlett Packard Labs, Milpitas, California. His area of research included hardware multiplexing and neural network inference. His current research interests include GPUs, FPGAs, SmartNICs, communication fabrics and scalability in high-performance computing and Machine Learning.
Aditya Dhakal (Member, IEEE) received the BE degree from the Kyushu Institute of Technology, Japan with the Japanese Ministry of Education scholarship, the MS degree from the University of Connecticut, and the PhD degree from the University of California, Riverside. He is a research scientist with Hewlett Packard Labs, Milpitas, California. His area of research included hardware multiplexing and neural network inference. His current research interests include GPUs, FPGAs, SmartNICs, communication fabrics and scalability in high-performance computing and Machine Learning.View more

IIT Gandhinagar, Gujarat, India
Sameer G. Kulkarni received the PhD degree from the University of Göttingen, Germany. He worked as a postdoctoral researcher with the University of California at Riverside, Riverside. He is currently an assistant professor with the Department of Computer Science and Engineering, and Electrical Engineering, Indian Institute of Technology Gandhinagar. His current research interests include parallel and distributed computing, software defined networks, network function virtualization, network security and cloud computing. His PhD thesis received the IEEE Technical Committee on Scalable Computing Outstanding Dissertation Award, in 2019.
Sameer G. Kulkarni received the PhD degree from the University of Göttingen, Germany. He worked as a postdoctoral researcher with the University of California at Riverside, Riverside. He is currently an assistant professor with the Department of Computer Science and Engineering, and Electrical Engineering, Indian Institute of Technology Gandhinagar. His current research interests include parallel and distributed computing, software defined networks, network function virtualization, network security and cloud computing. His PhD thesis received the IEEE Technical Committee on Scalable Computing Outstanding Dissertation Award, in 2019.View more

University of California, Riverside, Riverside, CA, USA
K. K. Ramakrishnan (Life Fellow, IEEE) received the MTech degree from the Indian Institute of Science, in 1978, and the MS and PhD degree in computer science from the University of Maryland, College Park, in 1981 and 1983, respectively. He is a distinguished professor of computer science and engineering with the University of California, Riverside. He joined AT&T Bell Labs, in 1994 and was with AT&T Labs-Research from its inception, in 1996, until 2013, as a distinguished member of Technical Staff. Before 1994, he was a technical director and consulting engineer in networking with Digital Equipment Corporation. Between 2000 and 2002, he was with TeraOptic Networks, Inc., as founder and vice president. He is an ACM fellow, and an AT&T fellow, recognized for his fundamental contributions to communication networks, including his work on congestion control, traffic management, and VPN services. His work on the “DECbit” congestion avoidance protocol received the ACM Sigcomm Test of Time Paper Award, in 2006, and he received the AT&T Technology Medal, in 2012 for his work on Mobile Video Delivery. He received the 2024 ACM SIGCOMM Award recognizing his lifetime contribution to the field of communication networks. He has published more than 300 papers and has 186 patents issued in his name.
K. K. Ramakrishnan (Life Fellow, IEEE) received the MTech degree from the Indian Institute of Science, in 1978, and the MS and PhD degree in computer science from the University of Maryland, College Park, in 1981 and 1983, respectively. He is a distinguished professor of computer science and engineering with the University of California, Riverside. He joined AT&T Bell Labs, in 1994 and was with AT&T Labs-Research from its inception, in 1996, until 2013, as a distinguished member of Technical Staff. Before 1994, he was a technical director and consulting engineer in networking with Digital Equipment Corporation. Between 2000 and 2002, he was with TeraOptic Networks, Inc., as founder and vice president. He is an ACM fellow, and an AT&T fellow, recognized for his fundamental contributions to communication networks, including his work on congestion control, traffic management, and VPN services. His work on the “DECbit” congestion avoidance protocol received the ACM Sigcomm Test of Time Paper Award, in 2006, and he received the AT&T Technology Medal, in 2012 for his work on Mobile Video Delivery. He received the 2024 ACM SIGCOMM Award recognizing his lifetime contribution to the field of communication networks. He has published more than 300 papers and has 186 patents issued in his name.View more