Skip to Main Content
Performance prediction of run time in the cluster is the foundation of efficient resource management and task scheduling. Considering the defects and limitations of traditional methods based on the history and modeling analysis, this paper proposes a new approach based on the Performance Skeleton. Through the use of the MPI library's PMPI interface, we can insert wrapper-functions to the source code, which can access all communication traces without changing the original program or affecting the operation of the original program. To merge these trace logs, we designed the trace log regularization and merging algorithm. For compressing circulatory traces, the most central and difficult problem, this paper converts it into a circular sub-string compression problem, and proposes an algorithm based on the suffix array. Its performance is better than the existing algorithms. To automatically reconstruct the Performance Skeleton, it solves the scalable problems of calculation and communication time. Experimental results show that these methods can accurately estimate the run time of computing jobs. The error is less than 3% for a homogeneous cluster.