Skip to Main Content
This paper presents the design, implementation and testing of the monitoring solution created for integration with a workflow execution platform. The monitoring solution constantly checks the system evolution in order to facilitate performance tuning and improvement. Monitoring is accomplished at application level, by monitoring each job from each workflow and at system level, by aggregating state information from each processing node. The solution also computes aggregated statistics that allow an improvement to the scheduling component of the system, with which it will interact. The improvement on the performance of distributed application is obtained using the realtime information to compute estimates of runtime which are used to improve scheduling. Another contribution is an automated error detection systems, which can improve the robustness of grid by enabling fault recovery mechanisms to be used. These aspects can benefit from the particularization of the monitoring system for a workflow-based application: the scheduling performance can be improved through better runtime estimation and the error detection can automatically detect several types of errors. The proposed monitoring solution could be used in the SEEGRID project as a part of the satellite image processing engine that is being built.