Skip to Main Content
With the advent of cloud computing, massive and automated system management has become more important for successful and economical operation of computing resources. However, traditional monolithic system management solutions are designed to scale to only hundreds or thousands of systems at most. In this paper, we present Blue Eyes, a new system management solution to handle hundreds of thousands of systems. Blue Eyes enables highly scalable and reliable system management with a multi-server scale-out architecture. In particular, we structure the management servers into a hierarchical tree to achieve scalability, and management information is replicated into secondary servers to provide reliability and high availability. In addition, Blue Eyes is designed to extend the existing single server implementation without significantly restructuring the code base. Several experimental results with the prototype have demonstrated that Blue Eyes can reliably handle typical management tasks for a large scale of endpoints with dynamic load-balancing across the servers, near linear performance gain with server additions, and an acceptable network overhead.