Skip to Main Content
Clusters have made the jump from lab prototypes to full-fledged production computing platforms. The number variety, and specialized configurations of these machines are increasing dramatically with 32-128 node clusters being commonplace in science labs. The evolving nature of the platform is to target generic PC hardware to specialized functions such as login, compute, Web server file server and a visualization engine. This is the logical extension to the standard login/compute dichotomy of traditional Beowulf clusters. Clearly, these specialized nodes (henceforth "cluster appliances") share an immense amount of common configuration and software. What is lacking in many clustering toolkits is the ability to share configuration across appliances and specific hardware (where it should be shared) and differentiate only where needed In the NPACI Rocks cluster distribution, we have developed a configuration infrastructure with well-defined inheritance properties that leverages and builds on de facto standards including: XML (with standard parsers), RedHat Kickstart, HTTP transport, CGI, SQL databases, and graph constructs to easily define cluster appliances. Our approach neither resorts to replication of configuration files nor does it require building a "golden" image reference. By relying on this descriptive and programmatic infrastructure and carefully demarking configuration information from the software packages (which is a bit delivery mechanism), we can easily handle the heterogeneity of appliances, easily deal with small hardware differences among particular instances of appliances (such as IDE vs. SCSI), and support large hardware differences (like ×86 vs. IA64) with the same infrastructure. Our mechanism is easily extended to other descriptive infrastructures (such as Solaris Jumpstart as a backend target) and has been proven on over a 100 clusters (with significant hardware and configuration differences among these clusters).