Skip to Main Content
This paper presents the design of a highly available distributed call processing system and its implementation on a local area network of commercial, off-the-shelf workstations. A major challenge of using off-the-shelf components is meeting the strict performance and availability requirements in place for existing public telecommunications systems in a cost-effective manner. Traditional checkpointing and message logging schemes for general distributed applications are not directly applicable since call processing applications built using these schemes suffer from high failure-free overhead and long recovery delays. We propose an application-level fault-tolerance scheme that takes advantage of general properties of distributed call processing systems to avoid message logging and to limit checkpointing overhead. The proposed scheme, applied to a call processing system for wireless networks, shows average call setup latencies of 180 ms, failover times of less than three seconds, and recovery times of less than seventeen seconds. System availability is estimated to be 0.99995. The results indicate that using our proposed scheme meets the above challenge.