Skip to Main Content
The efficient diagnosis of hardware and software faults in parallel and distributed systems remains a challenge in today's most prolific decentralized environments. System-level fault diagnosis is concerned with the identification of all faulty components among a set of hundreds (or even thousands) of interconnected units, usually by thoroughly examining a collection of test outcomes carried out by the nodes under a specific test model. This task has non-polynomial complexity and can be posed as a combinatorial optimization problem. Here, we apply a binary version of the Particle Swarm Optimization meta-heuristic approach to solve the system-level fault diagnosis problem (BPSO-FD) under the invalidation and comparison diagnosis models. Our method is computationally simpler than those already published in literature and, according to our empirical results, BPSO-FD quickly and reliably identifies the true ensemble of faulty units and scales well for large parallel and distributed systems.