Skip to Main Content
Summary form only given. Since the 1980s, the object of design for dependability has been to avoid, detect or tolerate system faults so that these do not result in failures that are detectable outside the system. Whilst this is potentially achievable in medium size systems that are controlled by a single organisations, it is now practically impossible to achieve in large-scale systems of systems where different parts of the system are owned and controlled by different organisations. Therefore, we must accept the inevitability of failure and re-orient our system design strategies to recover from those failures at minimal cost and as quickly as possible. This talk will discuss why such recovery strategies cannot be purely technical but must be socio-technical in nature and argue that design for recovery will require a better understanding of how people recover from failure and the information they need during that recovery process. I will argue that supporting recovery should be a fundamental design objective of systems and explore what this means for current approaches to large-scale systems design.