Skip to Main Content
In queue-based scheduling systems jobs are executed according to a predefined sequential plan. During execution, faults may occur that cause jobs to re-execute, thus delaying the whole schedule. It is thus important to determine (in real-time) whether the given set of pre-ordered jobs is fault-tolerant, that is, if all jobs will always meet their deadlines. This allows, for instance, to decide online whether to admit a new urgent job into the queue while still guaranteeing that the whole schedule remains fault-tolerant. Our goal in this work is to design efficient algorithm for testing fault tolerance of sequenced jobs in the presence of transient faults. We consider different fault models that specify which fault patterns are allowed to occur and how soon failed jobs can be restarted. For each fault model we provide efficient algorithms that determine the feasibility of all jobs in the schedule. Our algorithms are exact and run in time linear in the number of jobs (deterministically, or with very high probability, depending on the fault model), and thus can be used to make real-time decisions.