Digital Twins for Anomaly Detection in the Industrial Internet of Things: Conceptual Architecture and Proof-of-Concept

Modern cyber-physical systems based on the industrial Internet of Things (IIoT) can be highly distributed and heterogeneous, and that increases the risk of failures due to misbehavior of interconnected components, or other interaction anomalies. In this article, we introduce a conceptual architecture for IIoT anomaly detection based on the paradigms of digital twins (DT) and autonomic computing (AC), and we test it through a proof-of-concept of industrial relevance. The architecture is derived from the current state-of-the-art in DT research and leverages on the MAPE-K feedback loop of AC in order to monitor, analyze, plan, and execute appropriate reconfiguration or mitigation strategies based on the detected deviation from prescriptive behavior stored as shared knowledge. We demonstrate the approach and discuss results by using a reference operational scenario of adequate complexity and criticality within the European Railway Traffic Management System.

Digital Twins for Anomaly Detection in the Industrial Internet of Things: Conceptual Architecture and Proof-of-Concept Alessandra De Benedictis , Francesco Flammini, Senior Member, IEEE, Nicola Mazzocca, Alessandra Somma , and Francesco Vitale Abstract-Modern cyber-physical systems based on the industrial Internet of Things (IIoT) can be highly distributed and heterogeneous, and that increases the risk of failures due to misbehavior of interconnected components, or other interaction anomalies.In this article, we introduce a conceptual architecture for IIoT anomaly detection based on the paradigms of digital twins (DT) and autonomic computing (AC), and we test it through a proof-of-concept of industrial relevance.The architecture is derived from the current state-of-the-art in DT research and leverages on the MAPE-K feedback loop of AC in order to monitor, analyze, plan, and execute appropriate reconfiguration or mitigation strategies based on the detected deviation from prescriptive behavior stored as shared knowledge.We demonstrate the approach and discuss results by using a reference operational scenario of adequate complexity and criticality within the European Railway Traffic Management System.Index Terms-Anomaly detection, autonomic computing (AC), cyber-physical systems, digital twins (DTs), industrial Internet of Things (IIoT), process mining (PM).

I. INTRODUCTION
T HE application of Internet of Things (IoT) technology to the industrial domain and the rapid outbreak of the industrial Internet of Things (IIoT) and cyber-physical systems (CPSs) paradigms have dramatically improved the efficiency of industrial processes through the creation of new services (Ss) for customers and new revenue models.With the growth in complexity, distribution, and heterogeneity of modern computer systems in industrial applications, developing fault-tolerant architectures for such systems has become increasingly challenging.At the same time, disruptive IT technologies such as artificial intelligence have paved the way for the development of self-adaptive systems, which are able to modify their configuration and behavior in response to their perception of the environment, thus achieving resilience in addition to dependability [1].
In order to improve the system resilience through selfadaptation, any deviations from nominal behavior, also known as anomalies [2], should be monitored, detected, and analyzed, with the goal of identifying and implementing appropriate mitigations.Adequate reasoning and prediction functionalities are needed in order to identify anomalies and plan appropriate countermeasures, which requires a tight connection with the physical system to retrieve run-time data and perform recovery actions [3].
The development and adoption of successful anomaly detection techniques require the following: modeling nominal behavior of target systems, collecting data to evaluate compliance to behavioral prescriptions, and classifying the resulting diagnoses.Both nominal CPSs behavior modeling and classification of unknown behavior can be performed through data-driven approaches based on supervised or unsupervised machine learning (ML) [4], [5].Recently, these tasks have also been successfully accomplished through process mining (PM) [6], a research area concerned with extracting, checking, and enhancing behavioral models through event data collected from the monitored system [7].A PM technique that is particularly relevant to anomaly detection is known as conformance checking (CC), which evaluates compliance to reference (i.e., normative) process models through fitness metrics and specific parameters, referred to as the CC diagnoses.
Several approaches have been recently proposed that rely upon digital twin (DT) technologies for anomaly detection, as DTs can be designed for: modeling nominal behavior; generating synthetic datasets through simulation to inspect system behavior under different conditions; and deploying data-driven techniques to classify nominal and anomalous behavior when actual measurements are sampled, possibly exploiting information gathered through previous simulations [8], [9], [10].In an autonomic computing (AC) perspective, DT architectures can manage a bidirectional flow of run-time data and commands from/to the real system to implement proper reconfiguration and dynamically adapt to changes [11].Moreover, the DT may evolve, as internal models may be enhanced based on gathered data and past anomaly detection.
In light of the above, the contribution of this article are as follows: 1) we introduce a conceptual DT architecture for CC/MLbased IIoT anomaly detection, inspired by the MAPE-K feedback loop of AC, and hence following the principles of self-adaptation; 2) we demonstrate the approach through a proof-of-concept (PoC) of a reference real-world DT application in the railway domain, namely a relevant IIoT operational scenario within the European Railway Traffic Management System (ERTMS). 1 Furthermore, we show how core DT functionalities, such as behavior modeling and real-time data monitoring, support CC, and supervised ML through CC diagnoses.We develop a PoC addressing anomaly detection of cyber data linked to ERTMS control flow, and we evaluate and discuss results in comparison with other approaches.
The rest of this article is organized as follows.Section II introduces some fundamental DT concepts and provides the technical background on CC.Section III describes the conceptual DT architecture for IIoT anomaly detection inspired by the MAPE-K feedback loop of AC.Section IV addresses the railway case-study addressed in our PoC.Finally, Section V concludes this article.

A. Digital Twins
A DT is the virtual representation of the real system within CPSs, enabled by the seamless two-way communication between the cyber and the physical spaces (PSs) for real-time data exchange.Although the concept of DT has been around for nearly 20 years, industrial and academic interest in this field has only recently developed, and little effort has been devoted to the identification of a commonly accepted definition, as well as an architectural and functional characterization of DTs [12], [13].One of the most common DT definitions used by both industry and academia is the quintuple proposed by Tao et al. [14] M DT = (PS, VS, Ss, DD, CN) where 1) the PS consists of objects, systems, and/or processes, and their internal and external interactions; 2) the virtual space (VS) contains the faithful digital replicas, fed with real-time data; 3) the DT data (DD), obtained from the PS and domain experts, generated through virtual models and data coming from DT-based Ss; 4) the Ss offered through the DT technology (e.g., real-time monitoring); 1 https://www.era.europa.eu/activities/european-rail-traffic-managementsystem-ertms5) connectivity (CN) that enable the cooperation between the four parties.Since DTs are the combination of a set of virtual models of real systems, data, and Ss, a DT can be defined as a software system that actively represents an observed real system, and controls, optimizes, and/or predicts its behavior [15].

B. Conformance Checking
In this work, we use CC algorithms for anomaly detection, which require event logs and normative process models.An event log EL is a collection of uniquely identified traces where σ i is an ordered collection of events related to a specific instance of an application scenario.A normative process model is a model linked to a scenario (S), collecting its set of possible control flows (C(S)).We use Petri nets (PNs) as the reference formalism adopted to apply CC.A PN N is defined as a quadruplet (P, T, A, M ), where: P and T are two sets of nodes called places and transitions, respectively; A ⊆ (P × T ) ∪ (T × P ) is the set of arcs; and M is the marking, which is a collection of tokens, whose configuration within places of the network outline the state of N .For the sake of understandability, processes are often described using notations, such as the event-driven process chain (EPC) and Business Process Modeling Notation (BPMN) [16].Therefore, the application of CC using PNs requires the translation of models described using other notations, e.g., EPC or BPMN.The translated model must be a trace-equivalent PN, i.e., a PN with the same control flows C(S) as the starting model.This requires that a trace linked to a non-PN model must be translated when projected on its trace-equivalent PN, as transformation rules involve the introduction of other control-flow elements [7].
Our reference CC algorithm-an open-source implementation of the token replay algorithm defined in [7]-leverages token-based semantics of PNs.The algorithm triggers transitions t i ∈ T of a reference PN N according to events contained in replayed traces σ i of an event log EL.Four quantities are traced throughout σ i replays, namely the consumed (c), produced (p), missing (m), and remaining (r) tokens.c and p counters are updated upon t i triggering: their increment depends on the network structure, i.e., the sets P , T , and A of N .The m counter is updated whenever M does not allow t i triggering, counting the number of tokens required to trigger t i .The r counter is updated once all events are replayed, counting how many residual tokens are left as the last transition of σ i is triggered.Based on these four counters, a fitness measure F (σ, N ) ∈ [0, 1] is computed as follows [7]: where σ is an input trace and N is the input PN.Considering that an event log EL can have many traces [see (2)], the previous formula can be extended for taking into account multiple contributions due to multiple traces being replayed over N .The closest F is to 1, the more compliant σ (EL) is to N .CC can also include other fine-grained diagnoses.Indeed, untriggered transitions due to missing tokens can be traced throughout token replay executions.
CC diagnoses can be collected as shown in the example in Table I.Each row of the table encodes token replay results for each EL replayed over N .In particular, for each EL, the table reports anomalous statistics linked to the transitions t i ∈ T triggered by traces in EL (e.g., the number of times M did not allow t i triggering), the F value, and, if available, a label D classifying the diagnoses, opening the opportunity to apply supervised ML.

III. CONCEPTUAL DT ARCHITECTURE FOR IIOT ANOMALY DETECTION
Software architectures are fundamental in the engineering process of software-intensive systems, such as DTs.Nevertheless, there is limited discussion in the related literature about formalized DT architectures [17], [18], [19].Furthermore, the majority of DT studies proposes domain-specific architectural solutions [17] (e.g., Barenji et al. [20] presented a DT-based approach for smart manufacturing focusing on energy consumption).This has to do with the complexity of identifying the different components involved in DT operation and their functionalities.In turn, this leads to a general ambiguity and vagueness around the DT concept [21].
As recognized by some recent research studies [11], [22], DTs can be successfully used to fulfill self-adaptation requirements and improve system resilience.Hence, in Section III-A, we show how the MAPE-K loop for self-adaptive systems can be used to define DT core functionalities.In Section III-B, after a brief overview of existing approaches, we present our DT conceptual architecture.Finally, in Section III-C, we introduce the design of a novel online anomaly detection service based on the proposed architecture, and we illustrate the offline and online processes envisioned for its set-up and operation.

A. Integration of AC in DTs
An effective approach to deal with highly changing operational conditions of complex and dynamic software systems is represented by the MAPE-K feedback loop of AC, which has been recently associated with DTs [11], [23], [24].It includes four computation stages, namely monitor-analyze-plan-execute over a shared knowledge, which enables self-adaptation to runtime conditions.A specific aspect of self-adaptation is selfhealing, where any faults and behavioral anomalies are detected and possibly fixed before they can generate failures.Fig. 1 depicts the architecture of a self-adaptive software system.It consists of two layers, the managed subsystem, comprising the application logic, and the managing subsystem (i.e., autonomic manager), comprising the adaptation logic that implements the MAPE-K feedback loop.The monitor stage gathers and preprocesses data collected from the managed subsystem.The analyze stage handles aggregated data for the extraction of relevant information from the knowledge base.During this step, the managing system recognizes whether the managed subsystem works as expected or if there is an anomaly [24].The plan stage determines how to operate in response to anomalies.Finally, the execute stage performs the activities planned in the previous stage without any human intervention.
In order to comply with the AC principles and reference architectural framework, the conceptual DT architecture we aim to introduce must provide: 1) the capability to build and manage a knowledge base of data and system states; 2) the capability to collect and aggregate data from the underlying managed subsystem; 3) the capability to analyze monitored data and other available information to check the system status and verify whether an adaptation is required; 4) the capability to identify the workflow of actions necessary to achieve the system's goals; 5) the capability to carry out these actions through appropriate actuators over the managed subsystem.

B. Conceptual DT Architecture
Before presenting our conceptual DT architecture and showing how it fulfills AC principles discussed in the previous section, we provide a brief overview of the state-of-the-art of DT architectural solutions, since it helped us to shape our proposal and to identify a set of additional core DT functionalities not directly related to the AC paradigm.
According to a recent snapshot of existing software architecture proposals for DTs [17], the most recurring pattern is the layered one, in which each level has a specific role and responsibility.This reduces dependency and increases flexibility and modularity, so that complexity can be addressed more effectively.Many layered DT architectures have been proposed in the literature in different contexts [17].The most recurring patterns are based on the following.
1) Three layers-physical, digital, and connectivity: Most of early works on DTs did not discuss any specific architectural proposal.Indeed, DTs were simply seen as made of high-level physical and digital layers, interconnected via a seamless communication level for data exchange (e.g., [20]).The modeling functionalities inside the digital layer were considered predominant, while other important dimensions, such as data and Ss, were not taken into account.2) Four layers-physical, digital, connectivity, and application: This pattern explicitly considers an application layer, which typically leverages advanced technologies (e.g., AI and data analytics) to extract knowledge from real-time data and underlying models in order to build value-added Ss.An example of this pattern can be found in [25], where Aheleroff et al. distinguish between a digital layer devoted to the creation and management of the CAD-CAM models of physical objects and a cyber layer (acting as an application layer in the sense mentioned before) devoted to the construction of dynamic data models to enable digital functionalities at scale.3) Five/six layers: The majority of DT architectural proposals is characterized by five or six layers, where additional layers typically specialize some of the functionalities of the physical, digital, or service layers previously discussed.For instance, Redelinghuys et al. [26] split the physical layer into a level of perception and a level of control and actuation.Similarly, Lee et al. [27] proposed a DT-based CPS architecture where the physical system is composed of two distinct layers, responsible, respectively, for data collection (the smart connection layer) and for data aggregation/preprocessing and actuation (the information conversion layer).On top of that, the authors explicitly introduce a data analytics layer for data storage and elaboration.Going upward, a virtual system layer is responsible for models creation, update, and management.Finally, a service layer builds on top of underlying functionalities to offer intelligence, control, visualization, optimization, prognostic, and health management functionalities.According to a different approach, additional layers are introduced to cope with cross-cutting issues that impact all the other layers, such as security or privacy [28].4) Seven layers: There is no evidence of architectural proposals encompassing seven architectural layers, except for the work by Singh et al. [29] in which, besides the five layers (i.e., physical, virtual, storage, communication, service), authors added not only the security layer, but also the access layer in order to manage the interfacing between human and DTs.The conceptual architecture sketched in Fig. 2 leverages on a layered pattern and includes core elements and functionalities identified in the DT and AC literature.In accordance with the two dimensions of self-adaptation depicted in Fig. 1, the managing subsystem includes the AC elements identified in Section III-A, which are provided through a layered architectural organization.In particular, relevant DT functionalities are grouped into four horizontal layers, i.e., the PS layer, the Data (DD) layer, the VS layer, and the Ss layer.Connectivity (CN) among layers is represented by the black arrows.The functionalities included in those layers are inspired by the DT dimensions identified by Tao et al. [14] and mentioned in Section II-A.Although we keep Tao's reference to the PS dimension, the sensing and actuating functionalities in the PS layer can operate at both cyber and physical levels over the real CPS.In fact, according to current literature addressing CPS architectures based on the IIoT [30], CPSs are characterized by sensing, actuating, and intelligent decision-making and control and configuration functionalities [30], [31].In the autonomic DT perspective, sensing and actuating functionalities in the PS layer correspond to the monitor and execute stages of MAPE-K loop, respectively.We assume the fundamental intelligent decisionmaking, control and configuration capabilities to be embedded in the managed subsystem rather than in its DT.The DT is meant to extend those functionalities with additional ones by focusing on specific aspects, such as anomaly detection and self-healing.Therefore, the managed subsystem will perform sensing and actuation over the environment where it operates, according to its specification.
The data generated by the PS layer, those obtained by virtual models through Ss carried out by DT functionalities, and knowledge provided by domain experts, represent the "digital twin fuel" [14], [32].As CPSs are characterized by several distributed and interconnected entities that exchange and elaborate a massive amount of data, we must take into account an ad hoc layer, i.e., the DD layer, devoted to storing and managing all relevant data and information.In fact, persistence, elaboration, and presentation functionalities are required to be implemented in a DD layer for data storage, data preprocessing, and data filtering due to their multi-temporal, multi-dimensional, multisource and heterogeneous nature [14], and data visualization through aggregated views for end-users and managers [32].The persistence functionality in the DD layer enables to build and manage the shared knowledge base of the MAPE-K feedback loop, while the elaboration functionality supports the monitor stage.As the presentation functionality is responsible for human interfacing with the DD layer, it is not mapped to any MAPE-K stage.
The VS layer hosts the virtual replica of the real system.According to Schroeder et al. [32], the DT components that are strictly necessary are: 1) the models that digitally represent the CPS, whose roles can be descriptive, predictive, and prescriptive (normative) [33]; 2) an event source that generates information and/or commands to the physical system; 3) a set of AI algorithms that aim to extract useful information to feed DT models and the event generation block.Hence, the VS layer comprises modeling functionalities, reasoning on data for knowledge extraction, and feedback generation capabilities.The reasoning functionality is mapped to the MAPE-K Analyze stage, whereas the feedback functionality supports both the plan and execute stages.
An integrated software platform can be built upon the DT, including all the sub-Ss providing solutions to specific requests from PS and VS layers [17].Therefore, the Ss layer includes the set of functionalities needed for offering the Ss that leverage the DT technology, e.g., simulation, real-time monitoring, and prediction [13].Since this work focuses on anomaly detection in IIoT, the Ss layer depicted in Fig. 2 includes the set of sub-Ss required to perform anomaly detection activities, i.e., inferential engine, simulation, and CC.The inferential engine functionality supports the plan stag, whereas simulation and CC support the analyze stage.
From a software architecture pattern perspective, the architecture is open, with direct communication allowed between two nonadjacent layers.In particular, the communication from the VS layer to PS layer is strictly needed to enable the DT to control the physical counterpart through the feedback functionality, and thus, to implement the closed-loop connection that identifies the DT concept.Moreover, the Ss layer can directly use the functionalities offered by the DD layer to build value-added Ss.
Differently from the majority of existing DT models [17], the conceptual DT architecture described in this paper is domainindependent and supports several different implementations.Each of its functionalities can be implemented with one or more components; for instance, the MAPE-K stages may be performed by multiple components cooperating in a decentralized manner [34].

C. Anomaly Detection Service
A recent literature review conducted by Huang et al. on DT-based anomaly detection strategies [8] has identified the following three classes of methods. 1) Model-based methods, in which the detection is achieved by comparing the observed behavior of the real system against the predicted behavior generated by a model.2) Knowledge-based methods, which are appropriate when the detailed mathematical model is not available but a large dataset of known faults is available.3) Data-driven methods, which can be split into two categories, i.e., statistical (e.g., principal component analysis) and nonstatistical methods (e.g., ML algorithms).We adopt an anomaly detection technique deployed as a DT service that leverages on key DT functionalities mentioned in Section I, namely modeling, simulation, and use of data-driven techniques.Specifically, our service is based on the following: 1) describing high-level models and translating them to PNs; 2) simulating behavior under faulty conditions to collect synthetic datasets; 3) applying CC to simulated and PS data to extract CC diagnoses; 4) using supervised ML algorithms for classifying synthetic and PS data using CC diagnoses.Since our technique is both based on the reference managed subsystem behavioral models and data-driven techniques, it can be classified as a hybrid method.
Our proposal is implemented through two separate processes, which are depicted in the Unified Modeling Language (UML)2 activity diagram shown in Fig. 3, referred to as offline simulation (OS) and online monitoring (OM), respectively.Please note that dashed lines represent data artifacts flowing from one activity to another; each activity requires all its incoming data artifacts to be available in order to be carried out.Also, dashed boxes group activities linked to specific functionalities of the managing subsystem in Fig. 2. The goal of OS is to build an anomaly detector able to discriminate normal and anomalous behavior when the managed subsystem is exercised, whereas OM is concerned with collecting and applying CC to run-time PS data, classifying the resulting CC diagnoses using the anomaly detector.Inference results can be reused later on to build a more accurate anomaly detector, taking into account behavior that simulation could not highlight, hence characterizing a self-adaptive approach.
The OS process envisions activities for modeling, simulation, CC, and inference functionalities.As part of the modeling functionality, a normative model N hl prescribing the correct behavior of the system is specified.As discussed in Section II-B, the token replay algorithm used for CC requires processes captured as PNs.However, very often other high-level formalisms, such as the BPMN are used for process modeling.In this case, as anticipated in Section II-B, process models must first be translated to trace-equivalent PNs.Once a trace-equivalent PN N ll is in place, faulty behavior is simulated to generate synthetic datasets through the simulation functionality.Thus, a set of m normal event logs EL sim N is generated, where each element σ sim N,i ∈ EL sim N is a simulated set of n normal traces Each trace σ ∈ σ sim N,i is built with reference to an admissible control-flow sequence of transitions c ∈ C(S) the normative model N hl allows.During simulation, each σ ∈ σ sim N,i is injected with control-flow anomalies, and, finally, translated using the translation rules obtained when converting N hl to N ll , leading to the set of faulty event logs EL sim A .The CC functionality then applies the token replay algorithm to each σ sim A,i ∈ EL sim A ; each execution provides CC diagnoses (structured as in Table I).Finally, through the inference functionality, CC diagnoses are fed to a supervised ML algorithm to train the anomaly detector used to classify anomalies out of field-collected data.In this case, column D labels the anomaly injected during behavior simulation.
The OM process envisions activities for sensing and elaboration, CC, and inference functionalities.OM collects the managed subsystem data exploiting sensing and elaboration functionalities, whose responsibilities also include traces translation reusing rules obtained during OS.When a sufficient amount of event data have been collected, token replay is applied through the CC functionality, using the N ll model translated during OS.The resulting CC diagnoses are used by the inference functionality, providing the classification of the trace using the anomaly detector built during OS.Finally, by exploiting classification results functionality, new knowledge about the real process can be mined, adapting the anomaly detector to behavior that was not discovered through simulation.This is an example of self-adaptation of the managing subsystem to new managed subsystem dynamics.

IV. RAILWAY IIOT POC
As a PoC of the approach described in the previous sections, in this section, we provide a case-study that is relevant for IIoTconnected railways, which use smart devices to: deliver Ss to passengers and train drivers; perform automatic train control (ATC) and maintenance; and control energy consumption [35].Modern railways can highly benefit from DT technology, which allows for what-if analyses of critical situations and proactive decision-making [36].
In the PoC, we instantiate DT architecture to detect anomalous interactions among software components as they communicate and elaborate data within ERTMS, which is a standard for railway interoperability.ERTMS-compliant systems need to be extensively tested using simulation before their final deployment in order to ensure thorough verification of complex interactions among software modules.However, since these modules are possibly developed by diverse companies (e.g., one company develops the onboard control system, while another one develops the trackside control system), their integration might show edge case issues in rare untested scenarios due to slightly different interpretation of interoperability requirements, which-in worst case-might lead to hazards.In light of this, having a continuous OM at run-time might allow to detect and manage those anomalies.Furthermore, it is theoretically possible to predict, by online model analysis and/or accelerated simulations, the consequences of detected anomalies and choose the most appropriate reaction accordingly (e.g., send emergency stop messages).This drives the use of our DT architecture, as it provides the potential for reusing at run-time the same design-time models and federated simulators, with the same control software used in laboratories and installed in the real systems.
Although a fully fledged implementation of a railway DT can be very complex and comprehensive as a fully mirrored system, in this article, we only focus on the anomaly detection phase by using abstract models in a selected ERTMS scenario, with the aim of illustrating the reference technique.Please note that we limit the analysis to cyber anomalies, which, in this case, are software anomalies due to control-flow errors that faulty ERTMS components may cause.

A. European Railway Traffic Management System
ERTMS is part of a European standard specification including an ATC system for improving performance, safety, reliability, and interoperability among trans-European railway connections.It involves digital elaboration of on-track and on-board data through heterogeneous and distributed nodes.Therefore, ERTMS implementations represent a class of IIoT-connected Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.railways, as they are increasingly adopting new computing paradigms and technologies.
An ERTMS implementation is a distributed system connecting several subsystems, including the on-board (i.e., mobile) subsystems and the on-track (i.e., fixed) ones.Essential on-board components are the European vital computer (EVC) and the radio transmission module (RTM): the former implements the needed logic for safely managing data flow within the on-board subsystem and between on-track and on-board communications, and the latter handles all the needed train-to-infrastructure communication logic.On-track, the Radio Block Center (RBC) supervises trains in its area and handles several functionalities that ensure efficient and safe train operation.
Normative ERTMS behavior is prescribed by official system requirements specification, 3 capturing the behavior of components in all reference operational scenarios.Among them, the RBC/RBC handover scenario has been selected for our PoC of DT-based anomaly detection through CC.The RBC/RBC handover scenario describes the procedure to follow when the train is crossing areas controlled by two different RBCs, i.e., the handing over RBC (HRBC) and the accepting RBC (ARBC).Fig. 4 shows part of the BPMN model we have designed out of requirements specified for the RBC/RBC handover scenario, focusing on an AND split and join synchronizing branch, which also encloses a XOR split and join exclusive branch.Fig. 5 shows the UML component diagram of the system architecture designed according to the layered description in Fig. 2; rounded boxes within components stereotyped as functionality represent the functionalities that components implement.We split the ERTMS managing subsystem in two further subsystems: the ERTMS Ss,VS,DD and ERTMS PS subsystems.The former includes components implementing functionalities in the Ss, VS, and DD layers, whereas the latter characterizes the PS layer, grouping components and interconnections that represent virtual replicas of the ones the managed subsystem physically instantiates.As such, EVC, RTM, ARBC, and HRBC implement the sensing functionality to monitor events throughout the RBC/RBC handover scenario.TokenReplay, anomaly detector, and simulator implement Ss functionalities, whereas models and events handler implements VS and DD functionalities.These components implement most of the OS and OM processes depicted in Fig. 3 (e.g., anomaly detector performs the anomaly detector training and classification activities, and models and events handler performs BPMN model specification, event logs generation, and 3 https://www.era.europa.eu/content/set-specifications-3-etcs-b3-r2-gsm-r-b1_enevent logs collection activities).Please note that these components do not implement functionalities linked to the feedback and execute stages of the MAPE-K feedback loop, as our PoC scope is limited to anomaly detection.
In the following sections, we describe our implementation of the ERTMS Ss, VS, DD subsystem according to the UML design we presented, and carry out experiments simulating two sets of faulty event logs using two different fault injection schemes, one for generating the synthetic dataset to train the anomaly detector during OS, and the other for simulating the ERTMS managed subsystem behavior.We compare the results obtained with and without the application of the CC stage, i.e., without exploiting CC diagnoses.In this case, the data handled by ML are different.There are several ways to build the input data without CC, such as counting the number of occurrences of a given event or the frequency a given component triggered events.

B. Experimental Testbed
This section presents: the architecture of the experimental testbed we have designed for the PoC; the fault injection schemes used for generating the faulty ERTMS managed subsystem behavior and the ERTMS managing subsystem simulated dataset; and the factors and response variables used for evaluating anomaly detection results.
Our testbed architecture is designed according to the ERTMS Ss,VS,DD subsystem shown in Fig. 5.We here note our PoC stubs the ERTMS managed subsystem by simulating its behavior as well.In order to generate different behavior out of the same RBC/RBC handover model, two different fault injection schemes are used, whose labels are FS OS and FS OM , where the former is used during OS, and the latter during OM.Please note that since the ERTMS managed subsystem is stubbed, the implementation of ERTMS PS is not needed.
During OS, our testbed implements the event logs generation and anomalies injection activities by simulating behavior through the open-source PLG2 BPMN simulator 4 and injecting anomalies to the resulting event logs.Specifically, for each σ sim,OS N,i ∈ EL sim,OS N , control-flow anomalies (missed, duplicated, and/or wrongly-ordered activities [37]) are injected in each trace σ ∈ σ sim,OS N,i according to FS OS .This scheme involves the injection of control-flow anomalies depending on the component (ARBC, HRBC, EVC, RTM) that may be causing the anomalous behavior.Thus, FS OS injects component-wise control-flow anomalies A res , where res represents one of the system components, referred to as resources.Hence, res ∈ RES = {ARBC, HRBC, EVC, RTM}.The anomalous transformation f A res (σ) is such that where σ A res is trace σ injected with resource-wise control-flow anomalies, i.e., control-flow anomalies injected only to activities executed by res.In order to train an anomaly detector able to classify each of the anomaly types, f A res is applied to each σ ∈ σ sim,OS N,i for all elements of RES, so as each EL sim,OS N,i is injected four times with all four different kinds of anomalies, generating, for each EL sim,OS N,i , four anomalous event logs EL sim,OS A,i,res .This process is repeated several times until a sufficiently big set of faulty event logs EL sim,OS A is obtained.During OM, the event logs collection activity is implemented through simulation, generating a faulty event log EL sim,OM A with FS OM .This scheme also injects control-flow anomalies, but, instead of injecting anomalies linked to one specific resource, traces σ ∈ EL sim,OM N are injected with control-flow anomalies linked to more than one resource.Specifically, an injection probability P res is assigned to each resource, so that traces may highlight anomalies linked to different resources.Thus, FS OM injects component-wise control-flow anomalies A P probabilistically.The probabilistic injection of component-wise control flow anomalies is due to the generation of an injection probability vector P = P ARBC P HRBC P EVC P RTM .Therefore, for each σ ∈ EL sim,OM N the anomalous transformation f A P (σ) is such that where σ A P is a trace with resource-wise control-flow anomalies probabilistically injected.Please note that the probability vector P is generated by means of a probability distribution.In our experiments, we have constrained P with values such that res P res = 1, regardless of the probability distribution used to generate probabilities.
The factors we are going to consider in our evaluations are the number of traces each simulated event log (both during OS and OM) may have (NT) and the fault injection distribution (ID) used for FS OM applied during OM (ID).The response variables are the following accuracy, recall, and precision formulas: where 1) N is the number of test event logs (|EL sim,OM A |); 2) p HLP i is a binary variable linked to the ith event log that is equal to 1 when the faulty resource with the highest injection probability is correctly predicted, 0 otherwise; 3) HLP i is the highest injection probability value linked to the ith event log; 4) p i is a binary variable linked to the ith event log that is equal to 1 when the faulty resource with injection probability lower than the higher one is predicted, 0 otherwise, 5) and LP i is the injection probability value linked to the ith event log and the faulty resource predicted by the detector.In these formulas, N i=1 p HLP i • HLP i represent true positive classifications, N i=1 HLP i represent all true positives and true negatives, and N i=1 p i • LP i represent false positives.These formulas take into account the use of the FS OM injection scheme; the anomaly detector classification is correct whenever p HLP i = 1, i.e., the predicted label is associated to the resource with the highest probability of being faulty.Please note that in our experiments, we have not considered classifying normal behavior, meaning false negatives cannot be wrongly predicted, and, therefore, these are not taken into account during evaluation.
Our testbed setup is available online and it has been implemented through the mentioned PLG2 simulator, Python scripts for handling the activities shown in Fig. 3, and an open-source Java implementation of the token replay algorithm. 5

C. Anomaly Detection Evaluation
We present two experiments, whose goals are: 1) determining which pair of NT and ID factor levels provide the best detection performance, and 2) comparing the PM-based approach presented in Fig. 3 with the plain ML-based approach that does not use CC diagnoses.pair, with a total of 45 experiment repetitions.3) For each experiment run, FS OM builds P by independently drawing probabilities from the chosen distribution for each of the resources.Randomization is also applied to the order by which each drawn probability is assigned to each resource.4) Two-way analysis of variance (ANOVA) [38] is applied to outline whether sample groups highlight statistically significant differences.5) Use of the k-nearest neighbors (kNN) algorithm, setting k = 5 and using the Minkowski distance metric.Table II shows all sample means and variances linked to each 5-D sample group corresponding to all possible (ID,NT) pairs and the three traced response variables; for each cell three pairs (M, V ) are recorded, which correspond to sample means and variances linked to performance metrics accuracy, precision, and recall (A, P , and R, respectively).Two-way ANOVA results are collected in Table III for each response variable.Please note in all cases the Friedman test [39] has been applied due to nonnormality and/or heteroscedasticity of residuals.
Experiment 2) is carried out through a comparison scheme characterized as follows: 1) one factor (PM_USE), with two levels: no and yes; 2) 15 experiment repetitions; 3) the training process uses the same dataset produced during OS for both the factor levels and the same ML algorithm (kNN, with k = 5 and the Minkowski distance metric); 4) for each experiment run the dataset used for inference during OM is the same for both the approaches;  IV , where SM is the sample mean, SV is the sample variance, 95% CI is the interval where the sample mean of each sample group lays with 95% confidence, PV is the p-value resulted from the application of the paired t-test, and SSD states whether there is a statistically significant difference (with at least 95% confidence).

D. Results and Discussion
Experiment 1) has shown that the offline anomaly detector, which has been trained with anomalous traces generated with FS OS , predicted the most misbehaving resource with a good performance, reaching as high as 83.0%(NT=20, ID=uniform) and as low as 65.8% (NT=60, ID=normal) accuracy.Considering the new traces the anomaly detector classified were generated with a different scheme, namely FS OM , this means that although the anomaly detector was not trained during OS for detecting the exact same anomalies we have introduced in traces during OM, it could still provide insightful information about the most misbehaving resource.The best performance (Accuracy=83.0%,Precision=92.9%,and Recall=88.4%)was obtained when training the anomaly detector by injecting resource-wise anomalies in event logs made of 20 traces with FS OS , and using a uniform injection distribution for generating the probability vector P with FS OM .This was mainly due to probability vectors generated with the uniform injection distribution being more polarized for control-flow anomalies linked to one specific resource, rather than generating vectors with injection probability values close to each other, which made correct inference harder to achieve.Experiment 2) has shown that the use of CC diagnoses provides statistically better results than using a plain ML approach, leading to average accuracy being ≈1.1 times better when using CC diagnoses.Provided the FS OM scheme injects faults that may happen in a real ERTMS managed subsystem, the results discussed address external validity threats, because control-flow errors that may happen would be covered by FS OM as well.As a final remark, we also note the use of kNN in both the experiments is motivated by its simplicity, which improves ML results explainability and timing performance in online settings.

V. CONCLUSION
In this article, we have presented a conceptual DT architecture inspired by AC principles and based on 4 logical layers (i.e., physical, data, virtual, and service) to support a novel online anomaly detection service.Such a service has been designed by leveraging on the combination of CC and ML techniques.In order to instantiate the DT architecture and demonstrate it in a real-world application through a PoC, we provided an exemplary case-study of high criticality, namely an industrial scenario in the railway domain.
Starting from the results presented in this article, several research directions can be planned, including the following: 1) architectural blueprint refinements considering the current DT state-of-the-art in multiple domains [40]; 2) extended laboratory experimentation with industrial Eurepean Railway Traffic Management System simulators and real-world test-beds [41]; 3) timing performance anomaly detection, in addition to control-flow anomaly detection [7]; 4) introduction of process discovery to automatically generate run-time models [7]; 5) tackling of trustworthiness and explainability challenges within PM, ML, and DTs [42].

Fig. 2 .
Fig. 2. Layered conceptual DT architecture supporting anomaly detection through CC, and its mapping to the MAPE-K elements.

Experiment 1 )
is carried out through a two-factor full factorial scheme characterized as follows.1) Two factors, namely: a) ID, with the following three levels: uniform, normal, and lognormal; b) Number of traces per event log (both EL sim,OS N and EL sim,OM N ) (NT), with three levels: 20, 40, and 60. 2) Five experiment repetitions for each possible (ID,NT)

TABLE II MEANS
AND VARIANCES FOR EACH GROUP OF EXPERIMENTS

TABLE IV ACCURACY
COMPARISON RESULTS OF THE SECOND EXPERIMENT 5) use of the uniform injection distribution with 20 traces per event log; 6) a paired t-test is applied to compare accuracy results.The results of the experiment are shown in Table