By Topic

XCPU: a new, 9p-based, process management system for clusters and grids

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Ron Minnich ; Los Alamos National Lab, Los Alamos, New Mexico, USA ; Andrey Mirtchovski

Xcpu is a new process management system that is equally at home on clusters and grids. Xcpu provides a process execution service visible to client nodes as a 9p server. It can be presented to users as a file system if that functionality is desired. The Xcpu service builds on our earlier work with the Bproc system. Xcpu differs from traditional remote execution services in several key ways, one of the most important being its use of a push rather than a pull model, in which the binaries are pushed to the nodes by the job starter, rather than pulled from a remote file system such as NFS. Bproc used a proprietary protocol; a process migration model; and a set of kernel modifications to achieve its goals. In contrast, Xcpu uses a well-understood protocol, namely 9p; uses a non-migration model for moving the process to the remote node; and uses totally standard kernels on various operating systems such as Plan 9 and Linux to start, and MacOS and others in development. In this paper, we describe our clustering model; how Bproc implements it and how Xcpu implements a similar, but not identical model. We describe in some detail the structure of the various Xcpu components. Finally, we close with a discussion of Xcpu performance, as measured on several clusters at LANL, including the 1024-node Pink cluster, and the 256-node Blue Steel InfiniBand cluster

Published in:

2006 IEEE International Conference on Cluster Computing

Date of Conference:

25-28 Sept. 2006