Skip to Main Content
Large size cluster management is a complex and difficult task. In this paper, we firstly discuss distributed hierarchical autonomic management mechanisms including the framework of distributed hierarchical autonomic management system and functions of each its component. And then we design and realize a high-performance cluster management system DHAView. It has autonomic management features such as global information integration, global unified monitoring and management, alarm correlation inference base on autonomic element and local event association analysis. Now this DHAView system is successfully used to manage a real large size high performance cluster.