Cart (Loading....) | Create Account
Close category search window
 

MSSM: An Efficient Scheduling Mechanism for CUDA Basing on Task Partition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

The purchase and pricing options are temporarily unavailable. Please try again later.
2 Author(s)
Cheng Luo ; Grad. Sch. of Inf. Sci. & Technol., Univ. of Tokyo, Tokyo, Japan ; Suda, R.

This paper presents a multiple stream scheduling mechanism to enable parallel execution of kernels, data sending from host to device and data receiving from device to host with multiple streams in CUDA. Our mechanism can divide the kernels and bi-directional data transmission into small subtasks, and allow to easily and efficiently overlap them on the CUDA compatible graphic processing unit(GPU). To set the optimal subtask size, we have built one compute bound model for computing intensive application and one data bound model for bi-directional data transmission intensive application. Basing on the two models, we also provided three scheduling algorithms for data dependent and data independent applications to maximize the efficiency of the overlap. We have applied the mechanism to a set of benchmarks to understand the performance. The results show that our work can successfully hide the latency to achieve high performance which is very close to the optimal.

Published in:

Parallel and Distributed Systems (ICPADS), 2012 IEEE 18th International Conference on

Date of Conference:

17-19 Dec. 2012

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.