A security-aware framework for designing indu-strial engineering processes

Modern critical infrastructures are complex Cyber-Physical-Systems (CPS) that tightly integrate physical processes with Information and Communication Technology components. Numerous safety mishaps and security attacks in such systems have demonstrated the need to ensure their safety and security from early design stages. Research on CPS has mostly focused on securing existing, implemented industrial systems, while safety and security consideration during the design stages of modern industrial infrastructures has largely gone unnoticed. In this paper, we present a framework that extends previous, preliminary work on the integration of security in industrial engine-ering design practices, and provide an algorithmic approach that effectively reduces risk during industrial system design lifecycles. We achieve this by analyzing flows of materials and information related to physical processes using three steps: (1) Identifying critical compo-nents and flows, (2) prioritizing flows based on their ties to high risk and importance in terms of dependencies, and (3) classifying system components based on their influence on the overall industrial system. To do that, we utilize (i) material flow networks (MFN) for modell-ing/designing the physical system (ii) dependency risk graphs for analyzing networks dependencies and assessing the system, in terms of risk, (iii) graph minimum spanning trees, and (iv) network centrality metrics. To evaluate our approach, we model and assess the production chain corresponding to an oil refinery plant’s Liquefied Petroleum Gas (LPG) purification process. Preliminary findings demonstrate the complex dependencies between cybersecurity vulnerabilities and system safety.


INTRODUCTION
CI are Cyber-Physical Systems (CPS) consisting of physical processes, equipment and other components connected over Information and Communication Technologies (ICT). ICT allow for system control and monitoring, thus providing functionality and process optimization across a wide range of industrial applications, improving the CI operation and the provided services [1]. ICT systems depend on physical components, while Industrial Control Systems (ICS) (such as Supervisory Control and Data Acquisition (SCADA) systems) require data processing functionality to manage control and monitoring operations [2].
Continuous advances and progress in ICT led to novel CPS implementations that integrate sophisticated and complex ICT systems in CI, resulting in tighter integration between physical processes and the cyber domain. On the downside, this allows for new attack vectors to emerge, as seen in multiple security incidents such as Stuxnet, Maroochy Water Breach [3]- [4]. Such security threats can impact the system's safety and reliability; therefore, security and safety must be considered as depended properties during the design phase of an ICS. How-ever, safety and security are often treated separately, as the complexity of modern CPS makes it challenging to model and assess their interconnected nature [5]- [6]. To overcome this inconsistent practice, engineers and security experts must focus their modelling efforts on ensuring that both security and safety requirements are considered during the design stage of a CPS [7].
Authorities worldwide address the safety and security of CI as a high-priority issues [8]. The EU supports this through the NIS Directive [9], while the US has published a specific directive solely to protect CI [10]. Even though authorities have long identified the risks behind cyberattacks on CΙ and despite the numerous advances in CPS protection, still, to our knowledge, there is little work on how to consider both safety and security principles in industrial critical infrastructure engineering design. Investing in system architecture and integrating security and safety requirements early in the design stage is far more efficient and cost-effective than funding the protection of vulnerable and insecure systems [11], [12]. .

A. Contribution
We propose a system design framework for CPS that integrates both security and safety. Critical industrial infrastructures are production systems with complex production chains. In our work, these production chains are represented as networks of productive activities characterized by flows of resources, i.e., physical flows of materials and energy and flows of monitoring and control information. Physical flows are subject to availability requirements and constraints of the output capacity of the production system. Similarly, monitoring and information availability and integrity are required to ensure the system's output. To that end, we employ risk analysis and dependency analysis to assess a critical industrial infrastructure production chain. This paper extends significantly the work presented in [13] concerning security integration in industrial engineering design practices. The presented method builds on a previous work that models CI production chains as material flow networks. Material flow networks (MFN) are directed graphs consisting of vertices that represent the location of material and energy transformations (processes) or storage of a production chain, and edges signify material and energy flows between them [14], [15] (see discussion in Section IV.A for more). Engineers primarily utilize MFN to analyze system flows for optimization purposes based on multiple criteria (e.g., cost, environmental and social impact) [16]- [19]. Authors in [13] model and analyse material flow networks to detect and identify high-risk channels (paths) and critical components (flow network nodes) based on their overall effect on the system. The presented approach utilizes this technique to (i) model processes and flows in production chains of CI into MFN (ii) build an efficient flow model able to distinguish and map only the required flows and processes (essential in terms of risk) of an MFN into a dependency graph, simplifying calculations and improving applicability, (iii) calculate the likelihood value for each flow network node of the mapped dependency graph, considering both its failure rate and the lack of a required resource. In addition, it extends the previous work by introducing minimum spanning trees (MST) and centrality metrics to efficiently identify and prioritize high-risk flows and flow network nodes for risk mitigation. More importantly, it presents a clear roadmap/workflow to guide the risk mitigation efforts of engineers and security experts during the design stage acknowledging specific risk goals and system requirements.
To evaluate our approach, we assess a part of the production chain corresponding to the Liquefied Petroleum Gas (LPG) purification process of the TUPRAS oil refinery plant [20]. The material flow network data for the TUPRAS use case was provided by the EU-funded SPIRE-2019 FACTLOG project [21]. In summary, our paper contributes the following: 1. An improved modelling approach that maps and converts the assets and the interdependencies of a material flow network into a risk dependency graph based on the existing production chain topology of a CI.
2. An improved risk calculation methodology that depicts a threat's probability of disrupting a CI asset based on a noisy-OR mode.
3. High-risk channel (path) identification and prioritization utilizing dependency risk analysis 4. Critical flow identification and prioritization utilizing a Minimum Spanning Tree (MST) algorithm. 5. Critical flow network node identification and prioritization utilizing network centrality metrics. 6. A framework that provides a clear roadmap to guide and assist engineers and security experts risk reduction efforts during the design stage of a CPS.

B. Structure
The rest of the paper is organized as follows: Section II discusses related work and compares CPS protection methods. Section III describes the proposed framework. Section IV describes the fundamental building blocks of the framework. Section V discusses the methodology implementation in a real-world example and presents our findings to validate the methodology. Finally, the conclusion discusses paper results and potential future research in section VI.

RELATED WORK
Various methodologies are used to analyse CPS of CI security or safety and evaluate the different dimensions involved in the factors that affect CI operation and provided services [22]. The main goal of such high-level methodologies is to analyse and evaluate threats and multi-dimensional impacts of disruptive incidents involving CI in multiple sectors [23], [24]. Traditional risk assessment methodologies usually focus on vulnerabilities on IT systems of CI [25]- [27]. These assessments are performed on already established and functioning systems and primarily result in added layers of cybersecurity on top of existing systems. However, the ICT sector is not aided with security standards to the same extent as the IT domain. The ISA/IEC 62443 family of standards targets ICS and Industrial Automation and Control Systems (IACS) security [28]. The concept to perform a security risk assessment and management is similar to what is outlined in the ISO 27000 family [29]. To that end, traditional security technologies of IT systems are adapted to protect CPS security; however, threats from the CPS cyberspace are mainly unpredictable and untenable; thus, traditional reliability theory and fault-tolerant technology cannot completely prevent a system from failures [22].
Other approaches that focus on CI primarily delve into dynamically assessing industry IT and ICT networks by evaluating the cascading failures over time between assets involved in and among different business processes [30]. Most are utilizing graphical models over the system architecture and perform risk analyses to understand ICS (i.e., PLC, RTU, SCA-DA) weaknesses in the industry [31], [32]. Others perform targeted, technical attacks on individual ICS systems, e.g., bi-nary manipulation of ladder logic in PLCs, attacking actuator software, etc. [33]- [34]. Authors in Roy, et al. [35] utilize attack and counterattack trees to perform qualitative and probabilistic analysis of the security status of CPS. Other approaches utilize the concept of security-by-design to provide more flexible and effective ways to secure ICT/OS solutions during software development [36]- [37]. However, while addressing the risk of attacks to a CI, these approaches focus mainly on cybersecurity, ignoring the various threats and vulnerabilities of the various physical processes, components, and machines involving in the production chain of a CI.
Several simulation-based approaches have been developed to analyse and assess threats to improve security, thus the reliability and resilience of CI under attack scenarios [38]- [39]. The main problem with statistical model checking is that the probability estimation becomes unfeasible for rare events. Authors in Ferrario, et al. [40] proposed a method that utilizes a Monte Carlo simulation and a Hierarchical Graph to model ICS dependencies and evaluate CI robustness. Their modelling approach presents several similarities with our methodology, but they focus on high-level dependencies and supply chain conflicts, overlooking dependencies between physical processes in complex production chains. For instance, in modern production systems, various waste and energy recovery systems are applied that create circular dependencies in the production chain [41]- [42].
Many methodologies for safety risk assessment have been developed for CPS. Risk and hazard analysis techniques can be categorized into two groups: (i) Failure-based hazard analysis and (ii) systems-based hazard analysis techniques. Failurebased techniques include Fault Tree Analysis (FTA) [43]- [44] and Failure Modes & Effect Analysis (FMEA) [45]- [46]. Failure-based methods focus on the identification of the effects and probabilities of single component failures omitting failures rooted in the interaction of components. System-based analysis techniques include the Hazard and Operability Analysis (HAZOP) [47]- [48] and the system theoretic process analysis (STPA) [49]- [50]. System-based approaches do not focus on cybersecurity; thus, it difficult to quantify risk in modern systems that rely increasingly on IT and ICT to control physical processes. Also, both software vulnerabilities and the effects of non-technical influences on the system over time are very hard to measure at design time [51].
Several approaches attempt to safeguard both safety and security in ICS as addressing both safety and cybersecurity is paramount to modern CPS smooth and trustworthy operation [52]. Others adopt the concept of system-of-systems to address security, reliability, and robustness in CI [40], [53], [54]. However, although system-of-systems analysis provides a comprehensive top-down overview of the environment in which a CPS operates and how risk propagates to and from the system, it is fraught with uncertainty about how constituent CI systems operate and function. Furthermore, the challenge with most of these methods is that they focus only on the physical system from a safety perspective and not on the complete communication system [55].
System optimization is commonly used during the design stage of a production system [56]. To that end, traditional risk and safety assessments are used to optimize and create a secure and safe system at the early stages of the system lifecycle [57]. The main issue with these approaches is that they focus explicitly on cybersecurity or safety threats, neglecting the relationship between security and safety. From an engineering perspective, the concept of optimization in system design is not new, as material flow analysis (MFA) and material flow networks (MFN) are utilized during the design stage to optimize the model system based on multiple criteria (e.g., cost, environmental and social impact) [16]- [19]. However, most of them focus on cost-effectiveness and not consider the security or safety perspective of CPS in critical industrial infrastructures [58]- [59]. Other techniques implement security-by-design during the implementation stage by selecting certified components based on specific cybersecurity standards [60]. Others implement safety-by-design by selecting components considering various safety factors [61]. To that end, security and safety-by -design should go beyond selecting the individual system components and how they are secure or safe based on their design.
Similar to the work in [13], the proposed approach focuses on individual critical industrial infrastructures (like energy corridors for oil and gas supply, water, and waste treatment plants). In [13], authors embed security-by-design concepts in industrial critical infrastructure engineering by integrating a security risk assessment process into engineering design practices. In our approach, we build on this previous work by introducing an algorithm for detecting high-risk flow network nodes and flows to this risk assessment process. These high-risk nodes and flows are then used as input to a risk mitigation algorithm that extends previous risk assessment. We incorporate this entire process in a novel framework that provides an algorithmic roadmap for introducing risk mitigation deCIions to industrial engineers during the design phase of CPS. Similar to the work in [13], we utilize MFN [17]- [19] to model the underlying system. We employ traditional risk assessment methodologies to evaluate the ICT processes [25]- [27] and hazard analysis techniques, such as FMEA, FTA, HAZOP, STPA to assess the physical processes of the modelled system [43]- [50]. At the same time, we utilize a similar methodology with [30], [62], [63] to model dependencies between the different components of the modelled CPS and assess the cascade risk due to possible disruptions considering an all-hazardous approach.
Previous risk analysis approaches focus mainly on dependencies and study cascading failures between individual CI or between discrete components of corporate IT networks. However, these approaches do not yet address cascading failures between cyber and physical components inside individual infrastructures, nor they are able to analyse their subliminal attack paths. Such attacks can cause internal cascading malfunctions to equipment, which may lead to a chain reaction affecting other components due to erroneous data reports, injection or corruption of sensor information and actuator orders in communication channels, or unavailability of service. We tackle this gap by presenting a risk management framework that can be embedded inside industrial design processes. To achieve this, we analyse the interconnected components of a CI considering conditional probabilities of component failure and supply disruption due to threat manifestation in order to calculate the cascade risk of attack paths.
Our modeling approach resembles solutions that use the top-down method to analyze complex interdependencies in individual CI, similar to the work in [40], [64]. In [40], the authors consider engineered, physically networked (energy, transportation, information, and telecommunication) CI and their interconnected components to quantitative evaluate CI robustness. However, their approach neither provides metrics for cascade risk evaluation nor suggests mitigation controls to improve the CI robustness. In our approach, we suggest risk mitigation controls and guide expert efforts to reduce the risk of the individual CI by utilizing MST and centrality metrics following the techniques proposed in [65] for automated IT network risk reduction.

FRAMEWORK
The proposed framework follows the standard development system lifecycle incorporating security and safety criteria into the design phase. It spans all system processes, as security risks will need to be identified as early as the design phase and addressed accordingly.
The framework aims to assist designers in understanding the potential impact of compromised components and identifying and prioritizing weak points for risk mitigation. Fig. 1 summarises the steps and workflow of the proposed roadmap.
1. First, following the standard development lifecycle, engineers should discover system requirements and create a system model for evaluation during the design phase.
2. Next, system engineers with security experts should evaluate the modelled system (in terms of risk); a. if the risk level is acceptable and the system requirements are satisfied, the development lifecycle can continue. However, b. if the risk levels are above a threshold value, the system must be analysed to prioritize flows and processes for risk mitigation.
The threshold parameter is subjective, as per real use-cases when setting up industrial processes; a deCIion-maker can define the parameter based on the critical industrial infrastructure-under-design specific characteristics. Risk mitigation measures, in our case, include the addition or subtraction of network nodes and flows or the replacement of a node with similar functionality but a lower failure rate and impact value.
Each step of our methodology utilizes a set of mapping procedures and algorithms, where each one provides some insight on the critical industrial infrastructure production chain under analysis and outputs information to be used as input by the following step. Below we present the fundamental steps of our framework: I. Process dependency mapping: We identify, input, and map an industry's cyber and physical processes and process flows (MFN) into a dependency graph II. Process dependency risk analysis: We assign failure probabilities and impact values for each node. The algo-rithm pre-computes all n-order dependencies using the process dependency graph. Then for each dependency chain, outputs the cumulative dependency risk of each attack path. Finally, the algorithm calculates the overall risk for the mapped MFN. III. Critical flow analysis: The tool produces alternative graphs with minimum risk (MST), maintaining process connectivity, using the process dependency graph, and computes the removal rates for the removed high-risk dependencies, thus identifying and prioritize critical flows for risk mitigation. IV. Process influence analysis: The tool pre-computes the centrality metric values for each node using the process dependency graph and highlights the maximum values, thus identifying critical processes for risk mitigation. V. Steps beyond the scope of this work: The development of mitigating measures and the improvement of the system is not covered in this paper. However, it is intended that the results of this work are used as input for those steps. Also, in this work, we do not address the initial design of the system (in terms of defining the architecture, the processes, system parameters, etc.) and its requirements both in terms of quality and quantity. However, strictly defined system requirements are crucial to ensure that the selected mitigation measures do not affect the industry's qualitative and quantitative objectives. In the following section, we discuss in detail the building blocks that compose our framework.

BUILDING BLOCKS
This engineering design methodology uses the following six building blocks: 1. A Material Flow Network (MFN) method for modelling material, energy, and informational flows in production chains of critical industrial infrastructures as a Material Flow Networks based on MFN principles. 2. A modelling method that maps and converts the flow network nodes and flows of a material flow network into a risk dependency graph based on the existing production chain topology of a CI. 3. A risk calculation methodology to estimate the likelihood of a threat disrupting the system components' operation. 4. A multi-risk dependency analysis methodology for assessing the risk of the graph dependency paths and the graph overall risk. 5. A minimum spanning tree (MST) algorithm for critical flow identification and prioritization. 6. Network centrality metrics to identify influential critical flow network nodes. Each building block is briefly presented below.

C. Material Flow Network Modelling
Material Flow Analysis (MFA) is a systematic and analytical method for the mapping of flows and stocks of materials within a system defined in space and time. It aims to connect the sources, the pathways, and the sinks (targets) of materials in the system [66]. In MFA, Material Flow Networks (MFN) serve as reference models (templates) for developing more refined/optimized models in the same domain. MFN are based on the popular Petri Net methodology for specifying concurrent systems in Computer Science and have been transferred into the Environmental Sciences [15]. The use of MFN allows the representation of material flow systems as directed graphs where vertices represent single manufacturing steps or places where materials and energies are processed/transformed or stored. Graph vertices are linked by edges that correspond to the material flows within the system [14- [15]. Engineers primarily utilize MFA and MFN for process optimization and eco-balance (the process of efficient utilization of material resources and energies and balance of environmental impacts) [16]- [18].
In our approach, we follow the principles of MFA and utilize MFN to model processes and material and energy flows in production chains, similar with the work in [16]. We model flow networks as graphs with four types of nodes: (i) processes, (ii) junctions, (iii) input, and (iv) output nodes. These are connected via links (Fig. 2). Single activities in which resources (material, energy, and information) are processed and converted/transformed into other resources are referred to as Processes. Input Nodes are the initial sources of resources flowing towards processes and represent different external resource suppliers (e.g., industries, CI). Output Nodes are the final recipients of resources flowing from processes, and they represent various external resource receivers (e.g., the environment) or consumers (e.g., industries, households, CI). Finally, Junctions serve as storage nodes for network resources, connecting processes and serving as output nodes and input nodes for other processes. For all intents and purposes, processes and junctions represent the assets of a modelled infrastructure. Resources can flow between nodes via Links, constituting a mode of transport (e.g., pipes, cables, roads, ships). Such flows between nodes describe the rate at which resources are consumed (input flows) and produced (output flows).
For modelling purposes, we characterized input flows as regular or backup, determining the consumption lifecycle (i.e., a regular flow transfers resources continuously from parent to target node). Also, input or output flows are assigned to the same link if they share the same transport modes. Fig. 2 showcases a demo MFN demonstrating the various types of nodes and the flow of resources between them.

1) Modelling Process Specification
A critical step in the modelling process is process specification. This includes the definition of input and output resources and the relations between input and output flows and process parameters (i.e., environment temperature). Physical process nodes are the principal entities and comprise the majority of flow networks. They depict activities that begin with a set of resources (input) and end with new or processed resources (output). Thus, modelling system physical processes help researchers investigate and analyse various dangers posed by human actors (outsiders, insiders), natural disasters, and hazardous situations.
In our approach, as an exception to the preceding concept, processes can also describe industrial automation control and monitoring activities of the system that supply and receive monitoring and control data from and to connected processes or junctions. From a hierarchical perspective, we should note that automation control and monitoring activities are essentially subprocesses, at a different level from the physical processes that operate on top of them, providing crucial functionality. Modelling the automation control and monitoring activities as a part of the overall system allows us to study and analyse various types of cyber threats that leverage vulnerabilities of monitor and control information systems and the impact they pose to the physical components of the modelled system.
Using the methods provided, we can create a model that represents an actual CI production system. If all processes are accurately specified, and more importantly, their input and output flows, then and only then we can have a holistic view of the system and understanding of the various and complex interactions/dependencies between the physical and cyber ele-ments of the CI production chain so that we can evaluate the risk level of the CI.

D. Modelling Dependency Graphs
To analyse and assess the flow network nodes' risk and evaluate the infrastructure's overall risk, we must first pre-process the flow network and map MFN nodes and flows into a risk dependency graph.
In the pre-processing stage, we mark modelled junctions used to collect multiple flows into one as ignorable because they are used solely for mass-balance calculation purposes and do not reflect physical objects in the real world. Also, for each process, we select the required resources (primary flows) for its operation. Engineers based on the system specification requirements may model secondary flows for simulation purposes. For instance, a water treatment process requires water and chemicals for its purification, so an engineer will model both water and possible impurities it carries as input flows to the process.
After removing unnecessary junctions and secondary flows, we map all material flow network nodes (input, output, junction, and process nodes) as potential failure nodes, together with all flows between them, into a risk dependency graph.
Dependencies are modelled in directed, weighted graphs = ( , ) where the nodes represent the possible failure nodes of the system and edges represent the dependencies between them. The weight of each node (i.e., process, junction, input/output node) quantifies the estimated dependency risk of flow network node B on resources provided by flow network node A. This weight derives from the dependency between the flow network nodes. Weight calculation is presented below in Section IV.C. The resulting graph depicts the movement of resources from one possible failure node to another as input and output dependencies.

E. Risk Analysis
In this section, we look at how to measure risk, which factors contribute to risk in the cyber-physical model (including those unique to critical industrial infrastructures). Risk factors and likelihood evaluation are discussed in the sequel.

1) Risk Model Factors
A risk is the degree of possible failure that may occur in an established process, and risk assessment is one of the critical activities in the risk management process. The standard reference of risk as a cybersecurity assessment metric is the following Risk = Likelihood * Impact. Assessing risk means identifying the threats and then determining the likelihood and impact [25], [26], [60], [67]. We focus on external/internal threats in input resource supply, along with process availability risks. Such threats arise from malicious, natural, or accidental events. These high-impact events are unexpected, can cause a severe dysfunction of an internal process or the supply of a resource, and, more importantly, they can propagate down the production chain. Similarly, with Adenso-Diaz et al. [68], we do not differentiate between disruption types but rather consider disruptions in general and their effect on the production line. Thus, each node in the flow network is potential failure node that is either entirely disrupted or fully operational. This binary approach is a typical way to model disruption of resources supply [69], while it can be used to simulate disruptions in the field of CIP [70]. Due to the cyber and physical aspects and threats of the CPS, as indicated in Fig. 3, we utilize traditional risk and hazard analysis methods, such as ISO/IEC 27001, HAZOP and FMEA, to estimate the impact of cyber and physical threats of each flow network node in the cyber-physical system. Impact as a metric depicts the magnitude of harm due to the loss of availability or integrity of a flow network node (i.e., process). For example, the loss of a CI process due to a threat realized affects all dependent CI processes in the production chain, thus the system's availability and integrity. In many cases, a com-promised CI process could result in significant loss of life, casualties, material harm, environmental damage, and public service disruption.

2) LIKELIHOOD CALCULATION
In this stage, we calculate and assign a likelihood value to each node of the mapped risk dependency graph based on the initial flow network model flows. This value indicates how probable it is for a threat to disrupt the operation of a flow network node (i.e., a process, junction, or input/output node) by impeding its activity or disrupting the supply of one or more required resources. An individual flow network node (a piece of equipment) can have two states: either failed ( ) or functional ( ̅ ); similarly, an input resource R is either unavailable ( ) or available ( ̅ ). We should note here that the operation of flow network depends on its own ability to function and the availability of its required resources. As such, the probability of the event ( ) the operation of a single node with required resources to be disrupted given the node and the required resources are in the state u is ( | ). The probabilities ( | ) are called risk parameters, and with binary states, there are 2 +1 parameters to be defined for a flow network node with input resources.
For a network flow graph with nodes and edges/transfer links, the likelihood of a disruption for the operations of a node ∈ is calculated using (1).
where ( | ) denotes the conditional probability for the event ( ) the operation of node to be disrupted given the node and its required input resources are in state .
To evaluate the relationship of these disruptions between network flow nodes and input resources, we utilize a noisy-OR model [71] similar to [72]- [73]. The Noisy-OR model assumes the independence of causal influences among a flow network node and its required input resources [71]. This assumption provides a logarithmic reduction in the number of parameters required; for a flow network node with n input resources, there are + 1 independent parameters. By minimizing the number of network parameters, we improve the implementation process for real-world applications. This way, we reduce the computational and modelling challenges that MFN models introduce.
To this end, the state of node (i.e., failed or functional ̅ ) depends on its failure probability . The failure probability , ∀ ∈ estimates how likely it is for a node (a piece of equipment) to fail individually at some point in the future, as such the probability for node to be operational is (1 − ). Similarly, the availability state of a required resource (i.e., available ̅ or unavailable R) depends on its disruption probability . The required input resource disruption probability depicts the probability of a required input resource of node to be unavailable, as such the probability for the required resource to be available is (1 − ). Table 1 demonstrates the likelihood calculation for a node with one required resource based on the different states . We should note here that the likelihood is calculated recursively as a flow network node may have more than one required resource. Also, for nodes without any input resources, we have = .

States
( | ) The availability of an input resource depends on its regular and backup flows from supplier flow network nodes. To that end, the disruption probability of a required resource is calculated based on the disruption probability due to regular flows, and the number of backup flows utilizing (2). We should note here that we assume that the backup flows will always be available on demand.
The likelihood reflects the availability probability of an input resource from supplier flow network nodes that share the resource demand through regular flows. To that end, disruption probability is calculated based on the failure probability of the resource supplier flow network nodes utilizing (3). If a required input resource is supplied by only one flow network node then = where ( | ) denotes the conditional probability for the event ( ) the availability of a required input resource to be disrupted given the resource supplier flow network nodes are in state v.
To this end, the state of a supplier node (i.e., failed or functional ̅ ) depends on its failure probability , so the probability for a supplier node to be operational is (1 − ). The calculation of the disruption probability for a required input resource j of a process node that it is supplied from two flow network nodes { 1, 2} with regular flows is demonstrated in Table 2. We should note that the disruption probability is calculated recursively as a required input resource may have more than two supplier nodes.
The disruption (failure) probability for a flow network node (system component) can be estimated based on the reliability of the component. The reliability of a system component (machine) is its probability of performing its function within a defined period with certain restrictions under certain conditions. The following databases are a subset of the bestknown data sources for reliability and failures rate information for machines and processes: OREDA, GIDEP, TUD database, SRDF, European Industry Reliability Databank, CORDS, CCPS [74].
An alternative data source is failure analysis methods such as FTA or FMEA performed on similar systems. These methods are commonly used in safety and reliability analysis to understand how systems can fail and determine (or get an estimation of) event rates of a safety accident or a particular system level (functional) failure.

F. Dependency Risk Analysis
After having calculated the likelihood of disruption for each node, the methodology moves on to assess the risk of firstorder dependencies. For each a graph node A with risk R all output edges to receiver nodes have a cascade risk R . Potential disruption to a component of a CI is transferred from the previous connection to the next, where the disturbance of a required input resource, regardless of the cause, may propagate to the dependent components in the production chain. To calculate and assess the nth-order cascading risks propagated in a series of components, we use the following method that utilizes a recursive algorithm based on [62], [63]. Given 1 → 2 → ⋯ → is an nth-order dependency between n networked components, with weights , +1 = , +1 , +1 corresponding to each first-order dependency of the path, then the P. Dedousis, et al., A security-aware framework for designing industrial engineering processes VOLUME XX, 2021 9 cascading risk exhibited by for this component dependency path is computed utilizing (4).
where is the disruption probability of a flow network node, as calculated by using (1). The cumulative dependency risk is the overall risk exhibited by all the components in the sub-chains of the nth-order dependency. If 1 → 2 → ⋯ → is a chain of asset dependencies of length n then the cumulative dependency risk, denoted as 1,…, , is defined as the overall risk produced by an nth-order dependency: Finally, using the total number of all asset sub-chains (possible asset dependency paths) and their cumulative dependency risks, the methodology calculates the graph's overall dependency risk as the sum of the cumulative dependency risk for each nth-order dependency in the graph:

G. Minimum Spanning Tree Algorithm
The algorithm then utilizes minimum spanning trees (MSTs). MSTs are commonly used to find approximate solutions for complex network problems such as the Traveling Salesman [75], [76]. A spanning tree of a weighted undirected graph is a connected subgraph of such that (i) contains every node of graph , and (ii) T does not contain any cycle. A cycle is a graph path in which the first node corresponds to the last. A Minimum Spanning Tree (MST) has the minimum total weight [77]. An MST of a weighted undirected graph can be found by greedy algorithms, such as those described in [76], [78].
We create all possible MSTs' of the dependency graph. This way, we can compute and output the removal rates of each dependency (Removed Dependencies Report). If an edge is continuously removed from many MSTs, this has significant ties to high risk and importance in terms of dependencies and must be considered for mitigation measures/actions. To that end, we utilize MSTs to detect and prioritize high-risk material and informational flows in MFN for mitigation measures.
The MST graph's calculations are performed on the dependency graph using an implementation of Prim's algorithm [78]. Starting from a critical asset node (e.g., servers), any adjacent node with the smallest weight edge is selected and added to the tree. This process is repeated, always choosing the minimal-weight edge that joins any connected node not already in the tree. When there are no more nodes to add, the tree is a minimum spanning tree. Obviously, the resulting graph ′ = ( , ′) is undirected and non-circular, since it is based on Prim's algorithm, where ′ ⊆ .
If graph cycles exist and have multiple dependencies of equal weight, many alternative MSTs can be produced. Note that each MST of the graph contains the same number of dependencies, and this number is guaranteed to be the smallest possible that retains the graph's connectivity. However, since the produced MST is an undirected graph, there is no guarantee that it also contains the minimum number of directed paths that can retain the flow network nodes connectivity once the directions of the necessary edges are applied. Also, the removed dependencies from the MST production represent flows of material and information required for nominal system operation. As a result, MSTs are not applicable redesigns and cannot be utilized as MFN restructures themselves for risk mitigation purposes.

H. Centralities Metrics
At this stage of our analysis, the presented method also utilizes centrality metrics on the produced risk dependency graph. Centrality metrics are widely used in network analysis and flow management [79]- [81]. In graph theory and network analysis, centrality metrics attempt to quantify the position of a node in relation to other nodes and to estimate the relative importance of a node within a graph.
In a risk dependency graph, centrality metrics can be used as additional criteria to identify the flow network nodes that significantly affect the critical risk paths of the graph. Such nodes are suitable to consider when prioritizing mitigation controls. Therefore, if appropriate mitigation controls are applied to these nodes, multiple cumulative dependency risk chains can be reduced, thus lowering the overall graphs risk [82]- [83].
We note that different centrality metrics capture different aspects of network topology and thus describe different types of node influence. For example, previous comparative research on centrality metrics on dependency graphs allowed us to opt for two different centrality metrics to identify the most influential nodes [82].

1) Bonacich Centrality
The Bonacich (eigenvector) metric [84] measures the centrality of a node in a network. It is calculated using the following equation: where is a scaling factor, reflects the extent to which centrality is weighted, is the node adjacency matrix, is the identity matrix and is the matrix of ones. An adjacency matrix is a × matrix with each element assigned a value of 1 if an edge exists between the corresponding nodes and 0 otherwise.
A flow network node with high Bonacich centrality is adjacent to flow network nodes with very high (or very low) influence, depending on whether the parameter is greater or P. Dedousis, et al., A security-aware framework for designing industrial engineering processes VOLUME XX, 2021 10 less than 0 [84]. In a risk dependency graph, nodes with high eigenvector centrality (when ≤ 0) are of particular interest because they are connected to other important nodes with high connectivity. Their influence is proportional to the total risk of the first-order dependencies that affect it.

2) Closeness Centrality
This metric quantifies the central or peripheral placement of a node (asset) in a two-dimensional region based on geodesic distances. It is defined as: where ( , ) is the average shortest path between node and any other node in the graph [85]. Closeness centrality captures the average distance between a node and every other node in the graph and assumes that nodes can only pass influence to their existing edges. The normalized form of the closeness centrality represents the average length of the shortest paths instead of their sum and can be defined as: where ( , ) is the distance between nodes and [86]. A flow network node with high closeness centrality has short average distances from most other flow network nodes in a graph. Also, if a flow network node has high closeness centrality, it is in a position to propagate disturbance quickly.
In a dependency risk graph, nodes with high closeness centrality tend to be part of many dependency chains. In most cases, these nodes initiate fast cascading effects throughout a network [82] since cascading effects tend to affect relatively short chains [87]. The closer a node is to the initiator of a cascading event, the greater its effect is on the cumulative dependency risk because the likelihood of its outgoing dependency would affect all the partial risk values of subsequent dependencies (edges).

I. Analysis Output Information
The proposed methodology overall output comprises from: • metrics that assess the performance of the flow network (i.e., the flow network graph overall dependency risk, the top and average cumulative dependency risk, the number of attack paths), • an identification of the most critical (in terms of risk) dependencies (flows) and paths between flow network nodes, and • an identification of the most critical flow network nodes based on influence and presence in dependencies and paths.
The performance metrics and the identified high-risk flow network nodes and flows are then used as input to the risk miti-gation algorithm as described in Section III. The risk mitigation algorithm is calculated iteratively, producing alternate redesigns of the original flow network until the risk level of the system is deemed acceptable. The outputs mentioned above are recalculated on each iteration, helping the experts decide their mitigation actions, in order to redesign the system. The final output is an optimized system design model based on the parameters and the decisions made by the experts during the analysis.

EVALUATION J. Tool Implementation
The methodology was developed as a distributed application (called Process Simulation Modelling tool -PSM), including a desktop application and a web application. The desktop main application front-end and back-end are developed and implemented in the.NET framework using C#. The main application handles the modelling functionalities and the preliminary risk analysis. The web application back-end is developed in Java Spring using the Neo4j graph [88] and handles the risk dependency analysis. The desktop application front-end communicates and interacts with the web application back-end through an application programming interface (API). All the experiments were performed using a computer with an Intel Core I7, 4.6 GHz processor with eight cores, and 16 GB RAM.

K. Critical Infrastructure Case Study
The critical infrastructure understudy corresponds to the case study of TÜPRAŞ provided, examined in the context of EU funded research project FACTLOG. TÜPRAŞ refinery, located in Izmit, Turkey, produces various petroleum products such as LPG (Liquefied Petroleum Gas), gasoline, diesel, and naphtha. The refinery is composed of multiple units, each serving a specific role in the production process (e.g., production of LPG, production of gasoline, production of diesel, purification of products). FACTLOG project focuses on the LPG purification unit, i.e., on the various processes that need to be applied to turn the LPG production streams to LPG refined streams to meet specific quality specifications.
LPG a mixture of liquefied hydrocarbon gases C3-C4 (propane and butane), is a valuable energy carrier with numerous industry and transportation uses. It is a by-product of many refinery processes, such as Crude Distillation (CDU), Hydrocracking (HYC), Fluid Catalytic Cracking (FCC), and Platformer. After the production process, some impurities remain in the LPG that need to be removed (purification). The main purification processes correspond to the removal of: (i) naphtha (C5 and above) in debutanizer columns, (ii) ethane in deathanizer columns, and (iii) sulphur compounds, like hydrogen sulphur (H2S) and mercaptan (CH4S), in Amine Absorber Units (AAU).  This study's provided flow network model is representative of any typical LPG purification unit encountered in all oil-refinery industries [89]. Due to permit issues, only the MFN of the purification unit has been provided. Failure probabilities and impact values are assigned based on literature research of similar systems and processes. Utilizing the provided MFN (see Fig. 4), we identified 22 internal processes, 4 internal junctions, 11 internal inputs, 1 external input, and 1 output node for the part of the production line under study. Moreover, the MFN understudy includes 208 flows of materials and information.

1) Process Dependency Mapping
To map the MFN into a dependency graph, we pre-process nodes and flows based on the proposed methodology in Section IV.B. As such, we marked the models' junctions as ignor-able. In addition, for each node, we select the required (primary) resources for nominal operation: for example, for the CDU-1 debutanizer in the MFN C5, S, LPG, C2, Monitor & Control Data, as well as steam resources are modelled as input flows, but in reality, the debutanizer column requires only LPG gas, steam, and Monitor & Control Data to operate.
Following the pre-processing stage, we map 22 internal processes, four internal inputs, one external input, and one output for the part of the production line under study. Also, by marking the required resource for each input, we reduce the total number of flows for mapping to 90. In Table 3, we list the flow network nodes. Flow network nodes depicted use generic terms and ID in the examples described below. Also, in Table  7 (in the Appendix), we list the characterized flows (regular or backup) along with their respective resources, as described in Section IV.B. The tool automatically maps the material flow network into a risk dependency graph (Fig. 5). Each material flow network node and its respective input and output flows are used to model the asset dependency graph. We should note here that in this specific model, each resource flow corresponds to one dependency.

2) Process Dependency Risk Analysis
To assess the flow network model, we assign failure probabilities and impact values for each node based on a relevant literature review [90]- [101]. To detect the severity of the consequences for dangerous situations and potential accidents for the physical processes, we searched for previous assessments such as HAZOP, FMEA, and PHA analysis on oil refineries and, in particular, on gas sweetening processes and relevant equipment. Similarly, to identify the probability of a failure mode for a specific process and related equipment, we examined previous FMEA and FTA analyses on oil refineries and gas treatment plants.
A special mention must be made of the monitoring and control process. For this specific process and without having additional information, we consider a typical SCADA network including a control and monitor server placed at the control center, communication links in a corporate network with internet access, and one or more topographically dispersed field sites in the industrial plant containing field devices (i.e., sensors, PLCs, Remote Terminal Unit (RTU) or Intelligent Electronic Device (IED). We estimate the failure probability and the impact for the monitor and control process based on median taken from historical data considering various types of attacks such as (i) replay attacks, (ii) spoofing, (iii) denial of service, (iv) control message modification, (v) unauthorized write to MTU or RTU, and (vi) RTU response alteration based on literature review [102]- [105]. Table 3 lists the suggested values for both failure probability and impact assigned to each node, respectively. The proposed methodology for the risk analysis allows for engineers and security experts to modify or update probabilities and impact values based on historical data and after consultations with plant operators, engineers, supervisors, and security experts. In specific, if an organization installs a new security system or changes an industrial machine, experts can decide which attacks or hazards are less likely to cause a particular type of damage by updating the values for the probabilities presented in Table 3. Based on the assigned failure probabilities and the mapped flows, the tool calculates the likelihood of disruption and, by utilizing the assigned impact values, calculates the risk value of each node in the dependency graph (see Table 4) based on the methods proposed in Section IV.C.1. For example, GAS DEA (P19), which models the gas treatment process that removes H2S or other sulfur compounds from the supplied gas using amines (DEA), introduces the highest risk with a value of 1.1225 to the dependency graph. On the other hand, as to be expected, the electricity grid (I12) introduces the minimum risk with a value of 8.00E-05.
Finally, the tool computed the complete set of risk paths on the risk dependency graph. Paths have an order not greater than 6 ( Table 5). In this case, depicted paths correspond to flows from different processes inside the industry. The list below depicts an overview of the computed risk dependency paths.
Thirty-five network flow nodes produced 4033 dependency chains with orders ranging from three to six and potential risk values between 0.01-1. 39. The tool also computed the graph's overall dependency risk with a value of 3637.08 and the average cumulative dependency risk with a value of 0.91. As a result, system engineers and security experts can use this step VOLUME XX, 2021 14 output to identify dependency paths with risk values above a threshold value. Also, they can utilize the number of produced dependency chains, the number of node appearances in paths with risk above a threshold, and the graph's overall dependency risk as metrics to assess and compare the initial system with future redesigns. In our case, path P3 → P19 → P21 → P12 → P21 is the worst dependency with a risk value of 1.39. Moreover, the tool produced 123 similar paths as the worst dependency with slight variations in sequence and node appearance and, more importantly, the same risk. Hence, based on our analysis, nodes P21, P19, and P22 are the most critical. In particular, nodes P21 and P19 have the highest frequency of occurrence on paths with risk values from 1.39-1.36, and node P22 follows in frequency occurrence nodes P19, P21 on dependency paths with risk values from 1.35-1.32. Therefore, Anime (DEA) regeneration (P21) is deemed the most critical process, followed by the gas treatment process (P19) and the operation & monitoring process (P22). That is to be expected as the Anime (DEA) regeneration process supplies multiple processes with DEA and at the same time receives DEA from multiple processes for treatment, creating multiple cycles of dependencies, thus increasing the risk of disruption due to an attack or a hazard realized.

3) Critical Flow Analysis
The target is to identify and prioritize critical flows for risk mitigation. To this end, we produce all alternative graphs with minimum risk (MSTs) based on the method discussed in Section IV.E, using the process dependency graph, and we compute the removal rates for the removed high-risk dependencies. The tool produced 60 alternate MSTs with cumulative risk values between 5.8008-5.8015; each one has a total number of 34 dependencies. For comparison, the initial dependency graph has a summative risk value of 57.90 and total dependencies of 89. We should note here that the produced MST are not applicable redesigns or mitigation measures. Compared to the initial dependency graph, the produced MST have minimized risk, but the removed dependencies represent required material or information flows for nominal system operation. Nevertheless, the removed dependencies produced during the MST production can indicate where engineers and security experts should focus their efforts to minimize the overall system risk. Table 6 lists a set of high-risk dependencies scored based on how many have been removed from the total number of produced MST. The tool outputs a total of 109 high-risk dependencies with scores from 40-60 (Table 6). System engineers and security experts can use this step's output (Removed Dependencies Report) to identify dependencies, thus system flows, with scores and frequency of appearance above a threshold value. Our analysis on the remove dependency report shows that dependencies from node P22 appear with higher frequency (21 times) than all the other source nodes (2 times). That dictates the importance of the information flows from the operation and monitoring process (P22). That is to be expected based on the high risk the specific process introduces to the system, and it is in line with the results from our dependency analysis.

4) Process Influence Analysis
To identify and prioritize critical flow network nodes based on their influence on the modelled dependency graph, we precompute the centrality metric values for each node using the process dependency graph, as described in Section IV.F, and highlight the maximum values. The results of both the closeness and the eigenvector centrality metrics are presented in Table 7. Based on the results, node P21 presents the highest closeness centrality while node I4 presents the minimum. On the contrary, node P22 presents the highest eigenvector centrality while node I4 presents the minimum. The anime (DEA) regeneration process (P21) is characterized as a critical node as it is in a position, with its relationships (output flows), to spread risk quickly and to a large portion of the system. The operation & monitoring process is deemed the most influential because it connects to more influential nodes. Therefore, a direct attack on the operation & monitoring process (P22) can disrupt critical (strong influence) nodes to the system operation. The hydrocracking process is the least critical node (low influence), considering both the closeness and the eigenvector centrality metric. System engineers and security experts can use this step's output to identify a set of critical system components with centrality values above a threshold value. Consequently, based on our centrality analysis, anime (DEA) regeneration process (P21) and operation & monitoring process (P22) are considered critical nodes with high priority for mitigation measures contrary to the hydrocracking process.

L. Discussion of Results
Based on our overall analysis, anime (DEA) regeneration (P21) is deemed the most critical process, followed by the gas treatment process (P19) and the operation & monitoring process (P22). Additionally, the removed dependency report dictates a set of flows for risk mitigation. To that end, we discover that information flows from the operation and monitoring process (P22) while not having the highest score have a higher frequency of removal, thus inducing high risk to the system. That is confirmed by our centrality analysis, where the operation and monitoring process is deemed the most influential based on the premise that it connects to more influential nodes. Similarly, the anime (DEA) regeneration (P21) process is a high influence node to spread risk quickly and to a large portion of the system. Anime (DEA) regeneration process supplies multiple processes (absorbers) with DEA and at the same time receives DEA from them for treatment creating multiple cycles of dependencies. From an engineering perspective, the anime regeneration units have high importance in the chemical absorption process [108]. The gas treatment process (P19) represents an absorber which is undoubtedly the single most crucial operation of gas purification processes [109]. Furthermore, in our case, the gas treatment process receives flows from multiple processes, and it is involved in a circular relation with the Anime (DEA) regeneration process, emerging its criticality.
From a cybersecurity perspective, the operation and control processes and systems are crucial, as they are more open and more vulnerable to cyber-attacks due to existing vulnerabilities [32], [110]. We observed that the monitor and control process, while not part of the highest risk dependency path, has been identified as a critical node by our tool. That is to be ex-pected and considered valid because we are not interested only in cybersecurity threats and vulnerabilities but also hazards and safety issues that initiate from physical processes.

CONCLUSIONS
The proposed framework following the standard development system lifecycle incorporates security and safety criteria into the design process phase. To achieve that, it provides a clear roadmap for engineers and security experts to identify security and safety risks early in the design phase and address them accordingly, considering the system's quantitative and qualitative requirements.
The proposed methodology can model a CI production process underlying components and the cyber and physical interactions between them as a material flow network providing a holistic view of the system and a better understanding of the dependencies between the production chain's cyber-physical elements.
Our methodology and developed tool can assess the risk of disruptions due to accidental or intentional events and produce weighted risk dependency graphs, presenting how a disruption in one component may affect other dependent components. Producing the MST of a dependency graph depicts the potential of reducing the network's risk. To that end, by utilizing the removed dependency report produced during the MST production, we can identify and prioritize critical flows. Moreover, by utilizing centrality metrics, we can identify critical flow network components prioritized based on their influence. The prioritization assists engineers and security experts in where they should focus their efforts in order to minimize the overall network risk during the design stage.
The evaluation results from the pilot study in a part of the production line of an existing critical industrial infrastructure show that the presented approach is effective and trustworthy. To that end, our approach supports the proactive study of critical industrial infrastructures with large-scale production chain dependency scenarios advancing the concept of security and safety by design in critical infrastructure protection.

M. RESTRICTIONS AND FUTURE WORK
The presented approach has certain limitations. Like other empirical risk approaches that analyse dependencies, it relies on previous security and safety assessments on related industries and physical components to evaluate the impact and estimate failure probabilities. Our analysis highly depends on the level of detail and quality of the modelled system. By utilizing MFN, we create and assess a model of reality, and as such, results may miss actual risks. Our framework, especially during mitigation control selection, depends on the subjective opinion of the decision-maker as the tool cannot "guess" erroneous or leftover risks on a design.
Also, in our approach, we consider all the required input resources of the modelled flow network nodes as equally important, although in reality, some resources may be more important than others. For modelling disruption of flows, we utilize a binary approach to address availability, but that is not always the case for CI in the industrial sector. Attacks may modify the input quantity of a resource and alter the quality of the output product. These kinds of attacks are not immediately noticed and threaten the integrity of the provided services. Moreover, process mapping is essentially a base that cannot possibly analyse and describe cross-process risks, such as a 3 rd -party data monitoring company having a Denial of Service that in turn affects data aggregation from the historian, thus losing data and potentially harming a process in the mid-term future when trying to adjust/optimize it. Future work should concentrate on overcoming the limitations mentioned above.

ACKNOWLEDGMENTS
This work was partially supported by the EU-funded SPIRE-2019 FACTLOG project (FACTLOG: Energy Aware Factory Analytics for Process Industries, GA 869951). The results presented in this paper reflect only the authors' views and the EU is not liable for any use that may be made of the information contained therein.