Skip to Main Content
This paper describes a framework for achieving node-level fault tolerance (NLFT) in distributed real-time systems. The objective of NLFT is to mask errors at the node level in order to reduce the probability of node failures and thereby improve system dependability. We describe an approach called lightweight NLFT where transient faults are masked locally in the nodes by time-redundant execution of application tasks. The advantages of light-weight NLFT is demonstrated by a reliability analysis of an example brake-by-wire architecture. The results show that the use of light-weight NLFT may provide 55% higher reliability after one year and almost 60% higher MTTF, compared to using fail-silent nodes.