Skip to Main Content
A fault-tolerant computer architecture has been designed to meet the requirements of applications which require high system availability but can tolerate a short recovery time (limited to a few minutes) in the event of component failure. Critical to the success of this architecture is a heartbeat protocol governing communication between two independent processor subsystems. This protocol, which ensures correct negotiation of a primary/secondary relationship between the two subsystems in the presence of any combination of component failures, has been specified using a finite-state-machine description. The author describes the protocol specification and its validation (for formal correctness) and verification (for functional correctness) using the technique of computerized exhaustive exploration of global system state space.