Skip to Main Content
Development of the REE Commercial-Off-The-Shelf (COTS) based space-borne supercomputer requires a detailed knowledge of system behavior in the presence of Single Event Upset (SEU) induced faults. When combined with a hardware radiation fault model and mission environment data in a medium grained system model, experimentally obtained fault behavior data can be used to: predict system reliability, availability and performance; determine optimal fault detection methods and boundaries; and define high ROI fault tolerance strategies. The REE project has developed a fault injection suite of tools and a methodology for experimentally determining system behavior statistics in the presence of application level SEU induced transient faults. Initial characterization of science data application code for an autonomous Mars Rover geology application indicates that this code is relatively insensitive to SEUs and thus can be made highly immune to application level faults with relatively low overhead strategies.