Skip to Main Content
In this paper, we study the problem of reliable collective communication (broadcast or gossip) with the objective of maximizing the reliability of the collective communication. The need for collective communication arises in many problems of parallel and distributed computing, including Grid or cloud computing and database management. We describe the network model, formulate the reliable collective communication problem, prove that the maximum reliable collective communication problem is NP-hard, and provide an integer linear program (ILP) formulation for the problem. We then provide a greedy approximation algorithm to construct collective communication (through a spanning tree) that achieves an approximation ratio of 1 + ln(|V|+α|E|-1) , where α is the average number of shared link risk groups (SRLGs) along links, and |V| and |E| are the total number of vertices and edges of the network, respectively. Simulations demonstrate that our approximation algorithm achieves good performance in both small and large networks and that, in almost 95% of total cases, our algorithm outperforms the modified minimum spanning tree algorithms.