Abstract:
Debugging performance anomalies in databases is challenging. Causal inference techniques enable qualitative and quantitative root cause analysis of performance downgrades...Show MoreMetadata
Abstract:
Debugging performance anomalies in databases is challenging. Causal inference techniques enable qualitative and quantitative root cause analysis of performance downgrades. Nevertheless, causality analysis is challenging in practice, particularly due to limited observability. Recently, chaos engineering (CE) has been applied to test complex software systems. CE frameworks mutate chaos variables to inject catastrophic events (e.g., network slowdowns) to stress-test these software systems. The systems under chaos stress are then tested (e.g., via differential testing) to check if they retain normal functionality, such as returning correct SQL query outputs even under stress. To date, CE is mainly employed to aid software testing. This paper identifies the novel usage of CE in diagnosing performance anomalies in databases. Our framework, PERFCE, has two phases - offline and online. The offline phase learns statistical models of a database using both passive observations and proactive chaos experiments. The online phase diagnoses the root cause of performance anomalies from both qualitative and quantitative aspects on-the-fly. In evaluation, Perfce outperformed previous works on synthetic datasets and is highly accurate and moderately expensive when analyzing real-world (distributed) databases like MySQL and TiDB.
Date of Conference: 11-15 September 2023
Date Added to IEEE Xplore: 08 November 2023
ISBN Information: