Skip to Main Content
An important class of parallel processing jobs on clusters today are workflow-based applications that process large amounts of data in parallel. Traditional cluster performance tools are designed for tightly coupled parallel jobs, and not as effective for this type of application. We describe how the NetLogger Toolkit methodology is more appropriate for this class of cluster computing, and describe our new automatic workflow anomaly detection component. We also describe how this methodology is being used by the Nearby Supernova Factory (SNfactory) project at Lawrence Berkeley National Laboratory.