Skip to Main Content
Cost pressure is driving vendors of safety-critical systems to integrate previously distributed systems. One natural approach we have previous introduced is On-Demand Redundancy (ODR), which allows safety-critical and non-critical tasks, traditionally isolated to limit interference, to execute on shared resources. Our prior work has shown that relaxed dedication (RD), one ODR strategy which allows non-critical tasks (NCTs) to execute on idle critical task resources (CTRs), significantly increases NCT throughput. Unfortunately, there are circumstances under which, in spite of this opportunity, it is difficult to effectively schedule NCTs. In this paper, we introduce distributed temporal redundancy (DTR), which allows critical tasks, which traditionally execute in lockstep, to execute asynchronously. In doing so, DTR increases scheduling flexibility, resulting in systems that achieve much closer to the optimal NCT throughput than with relaxed dedication alone; in one set of experiments, DTR schedules no less 93% of the theoretical NCT cycles across a variety of synthetic benchmarks, outperforming RD by over 11%, on average. Furthermore, by distributing all redundant tasks across different resources, triple-modular redundancy, and therefore fault localization, can be achieved. We demonstrate that this can be accomplished with little additional cost and complexity: in practice, relatively few DTR tasks are in fight simultaneously, limiting the additional buffering needed to support DTR.
Date of Conference: 9-14 Oct. 2011