Skip to Main Content
View-oriented group communication is an important and widely used building block for constructing highly-available fault-tolerant systems. Unfortunately, group-communication based systems are extremely hard to test and debug due to a number of stateful complex algorithms deployed in parallel and the unique combination of distributed and concurrent programming paradigms that amplifies the non-determinism in the system behavior. In this work, we elaborate on the specific challenges we encountered during the process of testing DCS, a group communication component of the WebSphere (WAS) architecture, as well as on the methodology we have devised and employed in order to cope with these challenges. Our solution relies on a carefully compiled set of invariants that need to be preserved at every execution point and a log analyzer algorithm that performs cross-log verification for all the processes participating in the execution, as well as on of other techniques whose details are described in the paper.