Skip to Main Content
An increasingly diverse set of applications, such as Internet games, streaming videos, e-commerce, online banking, and even mission-critical emergency call services, all relies on IP networks. In such an environment, best-effort service is no longer acceptable. This requires a transformation in network management from detecting and replacing individual faulty network elements to managing the end-to-end service quality as a whole. In this paper, we describe the design and development of a Generic Root Cause Analysis platform (G-RCA) for service quality management (SQM) in large IP networks. G-RCA contains a comprehensive service dependency model that incorporates topological and cross-layer relationships, protocol interactions, and control plane dependencies. G-RCA abstracts the root cause analysis process into signature identification for symptom and diagnostic events, temporal and spatial event correlation, and reasoning and inference logic. G-RCA provides a flexible rule specification language that allows operators to quickly customize G-RCA and provide different root cause analysis tools as new problems need to be investigated. G-RCA is also integrated with data trending, manual data exploration, and statistical correlation mining capabilities. G-RCA has proven to be a highly effective SQM platform in several different applications, and we present results regarding BGP flaps, PIM flaps in Multicast VPN service, and end-to-end throughput degradation in content delivery network (CDN) service.