Skip to Main Content
Experimental performance studies on computer systems, including Grids, require deep understandings on their workload characteristics. The need arises from two important and closely related topics in performance evaluation, namely, workload modeling and performance prediction. Both topics rely heavily on the representative workload data and have their arsenal from statistics and machine learning. Nevertheless, their goals and the nature of research differ considerably. Workload modeling aims at building mathematical models to generate workloads that can be used in simulation-based performance evaluation studies. It should statistically resemble the original real-world data therefore marginal statistics and second-order properties such as autocorrelation and power spectrum are important matching criteria. Performance prediction, on the other hand, intends to provide realtime forecast of important performance metrics (such as application run time and queue wait time) which can support Grid scheduling decisions. From this perspective prediction accuracy as well as performance should be considered to evaluate candidate techniques. My PhD research focuses primarily on these two topics in space-shared, data-intensive Grid environments. Starting from a comprehensive workload analysis with emphasis on the correlation structures and the scaling behavior, several basic job arrival patterns such as pseudo-periodicity and long range dependence are identified. Models are further proposed to capture these important arrival patterns and a complete workload model including run time is being investigated. The strong autocorrelations present in run time and queue wait time series inspire the research for performance prediction based on learning from historical data. Techniques based on a instance based learning algorithm and several improvements are proposed and empirically evaluated. Research plans are proposed to use the results of workload modeling and performance prediction in the- - evaluation of scheduling strategies in data-intensive Grid environments.