Skip to Main Content
During the last decades, the wide advance in the networking technologies has allowed the development of distributed monitoring and control systems. These systems show advantages compared with centralized solutions: heterogeneous nodes can be easily integrated, new nodes can be easily added to the system, and no single point of failure. For these reasons, distributed systems have been adopted in different fields, such as industrial automation and telecommunication systems. Recently, due to technology improvements, distributed systems are also adopted in the control of power-grid and transport systems, i.e., the so-called large-scale complex critical infrastructures. Given the strict safety, security, reliability, and real-time requirements, using distributed systems for controlling such critical infrastructure demands that adequate mechanisms have to be established to share the same notion of time among the nodes. For this class of systems, a synchronization protocol, such as the IEEE 1588 standard, can be adopted. This type of synchronization protocol was designed to achieve very precise clock synchronization, but it may not be sufficient to ensure safety of the entire system. For example, instability of the local oscillator of a reference node, due to a failure of the node itself or to malicious attacks, could influence the quality of synchronization of all nodes. In recent years, a new software clock, the reliable and self-aware clock (R&SAClock), which is designed to estimate the quality of synchronization through statistical analysis, was developed and tested. This statistical instrument can be used to identify any anomalous conditions with respect to normal behavior. A careful analysis and classification of the main points of failure of IEEE 1588 standard suggests that the reference node, which is called master, is the weak point of the system. For this reason, this paper deals with the detection of faults of the reference node(s) of an of IEEE 1588 setup- This paper describes and evaluates the design of a protocol for timing failure detection for internal synchronization based on a revised version of the R&SAClock software suitably modified to cross-exploit the information on the quality of synchronization among all the nodes of the system. The experimental evaluation of this approach confirms the capability of the synchronization uncertainty, which is provided by R&SAClock, to reveal the anomalous behaviors either of the local node or of the reference node. In fact, it is shown that, through a proper configuration of the parameters of the protocol, the system is able to detect all the failures injected on the master in different experimental conditions and to correctly identify failures on slaves with a probability of 87%.