Skip to Main Content
A typical workflow for a distributed application involves a large number of resources that can fail, including network, hardware and software components. Even when monitoring information from all these components is accessible, it is hard to determine how anomalies and failures during the application execution are related to a given workflow component. However the capability of receiving and interpreting intermediate results and interacting with applications plays a significant role for developing scientific experiments. Considering the complexity of implementation of distributed systems and the large scope of issues the monitoring system should cover, what analysis and planning is required to implement effective scientific grid workflow monitoring? We propose a multi-layer approach which focuses on a clear identification of the workflow-level monitoring abstractions. Through a clear separation between higher and lower level mechanisms, this approach will allow the specification of application monitoring requirements at workflow level, and their implementation upon distinct monitoring technologies, including the ones supported by existing grid middleware.