By Topic

Performance of CUDA Virtualized Remote GPUs in High Performance Clusters

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)
Duato, J. ; Univ. Politec. de Valencia (UPV), Valencia, Spain ; Pena, A.J. ; Silla, F. ; Mayo, R.
more authors

In a previous work we presented the architecture of rCUDA, a middleware that enables CUDA remoting over a commodity network. That is, the middleware allows an application to use a CUDA-compatible Graphics Processor (GPU) installed in a remote computer as if it were installed in the computer where the application is being executed. This approach is based on the observation that GPUs in a cluster are not usually fully utilized, and it is intended to reduce the number of GPUs in the cluster, thus lowering the costs related with acquisition and maintenance while keeping performance close to that of the fully-equipped configuration. In this paper we model rCUDA over a series of high throughput networks in order to assess the influence of the performance of the underlying network on the performance of our virtualization technique. For this purpose, we analyze the traces of two different case studies over two different networks. Using this data, we calculate the expected performance for these same case studies over a series of high throughput networks, in order to characterize the expected behavior of our solution in high performance clusters. The estimations are validated using real 1 Gbps Ethernet and 40 Gbps InfiniBand networks, showing an error rate in the order of 1% for executions involving data transfers above 40 MB. In summary, although our virtualization technique noticeably increases execution time when using a 1 Gbps Ethernet network, it performs almost as efficiently as a local GPU when higher performance interconnects are used. Therefore, the small overhead incurred by our proposal because of the remote use of GPUs is worth the savings that a cluster configuration with less GPUs than nodes reports.

Published in:

Parallel Processing (ICPP), 2011 International Conference on

Date of Conference:

13-16 Sept. 2011