Skip to Main Content
In this paper we present an approach to fault-tolerant stream processing. In contrast to previous techniques that handle node failures, our approach also tolerates network failures and network partitions. The approach is based on a principled trade-off between consistency and availability in the face of failure, that (1) ensures that all data on an input stream is processed within a specified time threshold, but (2) reduces the impact of failures by limiting if possible the number of results produced based on partially available input data, and (3) corrects these results when failures heal. Our approach is well-suited for applications such as environment monitoring, where high availability and "real-time" response is preferable to perfect answers. Our approach uses replication and guarantees that all processing replicas achieve state consistency, both in the absence of failures and after a failure heals. We achieve consistency in the former case by defining a data-serializing operator that ensures that the order of tuples to a downstream operator is the same at all the replicas. To achieve consistency after a failure heals, we develop approaches based on checkpoint/redo and undo/redo techniques.