Monitoring and debugging parallel programs is a difficult activity. There are many situations where the traditional “stop the world, I want to get off” approach to debugging is simply unsuitable. Frequently, nonintrusive monitoring of the program execution is more productive in locating sources of error and also in monitoring “correct” programs for such purposes as performance measurement and tuning. This paper presents a number of space- and time-efficient tools and techniques to support nonintrusive, non-stop monitoring and debugging of parallel programs running on a shared-memory multiprocessor. The techniques include the use of spy tasks, circular history buffers, vectors of use bits, and data structure audits. Particular emphasis is placed on issues that pertain to parallel computing, such as dealing with concurrent execution, shared memory and data caches
Published in:
Software Engineering for Parallel and Distributed Systems, 1997. Proceedings., Second International Workshop on
Date of Conference: 17-18 May 1997