SAM-SoS: A Stochastic Software Architecture Modeling and Verification Approach for Complex System-of-Systems

A System-of-Systems (SoS) is a complex, dynamic system whose Constituent Systems (CSs) are not known precisely at design time, and the environment in which they operate is uncertain. SoS behavior is unpredictable due to underlying architectural characteristics such as autonomy and independence. Although the stochastic composition of CSs is vital to achieving SoS missions, their unknown behaviors and impact on system properties are unavoidable. Moreover, unknown conditions and volatility have significant effects on crucial Quality Attributes (QAs) such as performance, reliability and security. Hence, the structure and behavior of a SoS must be modeled and validated quantitatively to foresee any potential impact on the properties critical for achieving the missions. Current modeling approaches lack the essential syntax and semantics required to model and verify SoS behaviors at design time and cannot offer alternative design choices for better design decisions. Therefore, the majority of existing techniques fail to provide qualitative and quantitative verification of SoS architecture models. Consequently, we have proposed an approach to model and verify Non-Deterministic (ND) SoS in advance by extending the current algebraic notations for the formal models as a hybrid stochastic formalism to specify and reason architectural elements with the required semantics. A formal stochastic model is developed using a hybrid approach for architectural descriptions of SoS with behavioral constraints. Through a model-driven approach, stochastic models are then translated into PRISM using formal verification rules. The effectiveness of the approach has been tested with an end-to-end case study design of an emergency response SoS for dealing with a fire situation. Architectural analysis is conducted on the stochastic model, using various qualitative and quantitative measures for SoS missions. Experimental results reveal critical aspects of SoS architecture model that facilitate better achievement of missions and QAs with improved design, using the proposed approach.

Initial variables in stochastic modules

I. INTRODUCTION
A System-of-Systems (SoS) is a complex system that behaves in a stochastic manner resulting from the collaboration among various heterogeneous sub-systems known as Constituent Systems (CSs). The CSs are often distributed, exhibiting operational and managerial independence, but work together to achieve the SoS mission with the help of emergent behaviors which an individual CS alone could not accomplish [1], [2]. The emergent behavior, which is dynamic and results from unknown CSs interactions at runtime, makes overall SoS behavior stochastic [3]- [5]. Due to the independence and autonomy of SoS CSs, the administrators of the SoS have loose control over CSs, making it difficult to ensure the correct architectural design of SoS [1], [6], [7]. Modern mission-driven critical infrastructures designed as SoS provide services that are essential in daily life, including health, transportation, energy, emergency, and rescue services. If not designed properly, an SoS could fail, leading to the loss of human lives, disruption of core businesses and damages to economic growth [8]- [10]. Unlike traditional single systems whose components, structures, and behaviors are well known, it is challenging to design and implement SoS architecture since it is stochastic due to its unknown CSs, unpredictable behaviors, and continuous evolution [4], [11]. Therefore, the main focus of this study is to devise a unique formal modeling and verification approach for SoS architectures. The Software Architecture (SA) modeling of complex software-intensive systems involves describing the functional features and performing structural analysis to determine potential defects and SA design issues. The SA design issues have a detrimental effect on Quality Attributes (QAs) such as performance, reliability, and security [12], [13]. A correctly designed SA for describing structure and behaviors coupled with constraints specification is crucial to software modeling and verification. Although SA provides specific modeling abstractions for the early prediction of defects and design issues, the current notations available for SoS modeling and verification lack the essential reasoning capabilities required to deal with the Non-Deterministic (ND) architecture. An SoS architecture is ND primarily because it is not known in advance whether the potential coalition of SoS (consisting of CSs that are autonomous and fully independent) can conform to the functionalities and QAs. Consequently, SoS structures, behaviors, and related QAs are not easy to predict and measure [11], [14]. Therefore, SA modeling and reasoning for such systems are challenging tasks that require strong mathematical foundations to specify stochastic behaviors and ND events in an unpredictable environment [15]. In this context, formal modeling and reasoning enable systems designers to specify and analyze SA models using a robust mathematical foundation.
Among various SA modeling tools, formal Architecture Description Languages (ADLs) are strong candidates for representing software systems architecture in the form of components (CSs), connectors (Mediators), and resulting configurations/coalitions [16]- [18]. The majority of these ADLs are based on core process algebras originating mainly from Calculus of Communicating Systems (CCS) [19] and Communicating Sequential Processes (CSP) [20] to model the SoS architecture [21]- [24]. However, they fail to deal with the stochastic behavior and dynamic nature of SoS [15]. On the other hand, some approaches try to model complex systems similar to these using Stochastic Process Algebras (SPA) [25]. Still, these formal ADLs individually based on process algebraic notations, i.e. CCS, CSP, SPA and related approaches [26] have certain limitations when it comes to modeling SoS [4], [11]. These limitations include: (a) vocabulary and reasoning capabilities to manage the VOLUME 8, 2020 architectural characteristics of SoS (b) support for automated model verification [4] concerning missions and QAs and (c) inability to specify and reason dynamic stochastic behaviors and uncertainty of the SoS at the architectural level.
Verification of a complex system can be performed with statistical model-checking that enables various architectural analysis of system properties [27]. However, most of the current modeling approaches have a semantic mismatch making it difficult to perform automated quantitative verification analysis and reasoning. This requires models to retain semantic consistency and completeness during the transformation process, which needs to be addressed.
In this research work, we overcome these limitations with a unique approach based on Model-Driven Engineering (MDE) [28] that supports the stochastic architecture modeling of the SoS by using Markovian process algebra. This work makes several key contributions to this area of research. It is broadly categorized according to three aspects: (i) Syntax and semantics of SPA are extended with concurrent and stochastic composition operators into Hybrid Stochastic Formalism (HSF) as our proposed formalism. HSF brings features such as probabilistic choices, non-determinism and Stochastic Concurrent Constraints Programming (SCCP) constructs [29] to describe SoS architecture models. (iI) The resulting stochastic model specified using HSF is a Markovian model that supports Stochastic Model Checking (SMC). Formally founded mapping rules from proposed HSF to PRISM [30] are defined using formal semantics to perform automated verification analysis of the SoS model. (iii) Various system behavioral reachability and quantitative analysis of dynamic properties are performed with Continuous Stochastic Logic (CSL) on the transformed SoS model for the first time in PRISM using known and unknown bounds.
The proposed approach is validated using a case study of a Cyber-Physical System (CPS)-based SoS (CPSoS) for Real-Time Fire Monitoring and Emergency Response. The system has been modeled with the proposed HSF taking into account ND behavior and concurrent compositions.
The probabilistic behavior has been tested for reachability employing approximate SMC. Steady-state and transient analysis are applied to predict QAs such as performance and reliability, using multiple scenarios to assess mission accomplishment qualitatively and quantitatively.
The rest of the paper is organized as follows. Section II describes the work related to our approach; background knowledge has been established in section III. The proposed approach has been elaborated next in section IV. The syntax and semantics of the proposed formalism have been extended in section V. Section VI provides an SoS architectural design with extended formalism. Section VII presents mapping rules from extended formalism into PRISM. Validation of the approach has been provided in section XIII with a case study implementation, including preliminary results discussions. A comparison of the proposed approach with existing works is performed in section IX. Finally, conclusions are drawn, and future work is discussed in section X.

II. RELATED WORK
Work related to our proposed approach can be categorized into two broader bodies of knowledge: (i) Formal representation of complex systems architecture, especially (Formal Syntax and Semantics) and (ii) System architecture qualitative and quantitative analysis through model verification.
Over the past decade, there has been an emphasis on formal modeling of complex distributed systems to acquire insights into a system in terms of its architectural design and behaviour, and how it evolves over a period of time [31]. The most common and widely used formalisms for architecture description are categorized into: (i) Petri-nets, (ii) Queuing Networks, (iii) Z Notations (iv) Bi-graphs and (v) Process Algebras [32], [33]. Process algebra-based ADLs have increased in popularity and have established a place in the industry and academic research [16], [34], [35]. Architecture Description Interchange Language (ACME), enhanced with multiple formal representations, allows us to specify certain Non-Functional Requirements (NFRs) 1 with architecture structure and behavior using Wright and Rapide annotated properties [36], it has been extended for product line software system with aspect-oriented semantics. Architecture Analysis and Design Language (AADL) is a semi-formal notation for modeling systems structures and behaviors along with system properties [37]. Its syntax and semantics have been extended for the modeling of safety and hazard scenarios and detailed analysis using QaSten [38] approach for error modeling and verification.
For some time now, Electronics Architecture and Software Technology-ADL (EAST-ADL) has been used to model complex autonomous systems. Its application has been extended to modeling stochastic behaviors as it improves Clock Constraint Specification Language (CCSL) time-constrained semantics for probabilistic analysis [39]. ACME models are transformed into ALLOY 2 for performing verification, by integrating formal modeling notation of Wright into the Failures-Divergences Refinement (FDR) model checker; however, this approach has certain limitations in terms of QAs verification [40]. Cavalcante et al. [41] devised an approach for the verification of dynamic SA specified in π−ADL using the Plasma-Lab 3 SMC tool. For the verification of dynamic properties, DynBLTL was used, which is an extension of Bounded Linear Temporal Logic (BLTL) [42]. However, these model-checking processes cannot verify stochastic models and face the problem of state-space exploration [43].
In their work in [44] extended the Behavior Interaction Priority (BIP) [45] formalism to the stochastic formalism of Stochastic BIP (SBIP) [46] based on timed automata for enriched compositional modeling. They used Probabilistic BLTL (PBLTL)-based SMC algorithms to verify the properties of SBIP models, focusing particularly on performance evaluation. However, there was no explicit description of the system threshold and the non-functional properties of the system. Song et al. [47] extended Monterey Phoenix (MP) formal modeling language with probabilistic automata for modeling software systems, using a model-checking tool based on Process Analysis Toolkit (PAT) to verify system behavior and quantitative evaluation. This approach used dead-lock checking and reachability analysis using Event-trace Linear Temporal Logic (LTL) based algorithms.
The aforementioned formalisms embedded into ADLs cannot be used to model and analyze SoS architectures as, predominantly, the formalism and vocabulary used in these approaches deal only with deterministic systems whose components, behaviors, and operating environments are already known to the system designers [3].
In their research work on SoS, Arnold et al. used UPDM/SysML profiles for modeling CSs as functional mock-up units, using Contract Specification Language to define the constraints on input and output [48]. The Plasma-Lab model-checking tool was used to perform SMC on stated execution traces of SoS. However, the underlying formalism used here is unable to reason about ND behaviors. Bozzano et al. [49] employed probabilistic model checkers with Compass Modeling Language (CML), a semiformal SoS modeling language to perform safety verification. Sosadl [21] is a formal modeling language that specifies the deterministic structure and behaviors of SoS architecture. It is able to describe static architecture at abstract levels with intentional compositions. However, neither Sosadl nor CML support rigorous stochastic reasoning and cannot support qualitative and quantitative verification of behaviors and associated QAs. Moreover, they are unable to reason uncertainty concerning unknown CSs and their interactions.
Our approach is unique as it creates a modeling specification to describe SoS stochastic behaviors and dynamic structures in terms of runtime using rich syntax and semantics. The formally founded stochastic SoS models enable the formal verification and validation of system properties with steady-state and transient analysis. We try to enhance the SoS CS exogenous contractual behaviors by integrating CCS and CSP into the SPA as Markov models. Similarly, we use stateof-the-art SMC tool with unique stochastic model-checking algorithms for verifying dynamic emergent behavior. The proposed modeling approach enables SoS designers to verify qualitatively and quantitatively missions and goals from runtime perspectives at the design stage, taking into account the core architectural aspects of the SoS.

A. THE COMPLEX SYSTEM ORGANIZATION
SoS is a special type of complex system with increased complexity when implemented on a larger scale [50], [51] accompanying the physical components and information systems capabilities ranging from cybernetics-multi agents to computational biological systems [52]. Considering the types of SoS; i.e. collaborative and Virtual SoS, CSs are fully independent, geographically dispersed, and become part of SoS with partial contracts to achieve global missions by forming stochastic emergent behaviors dynamically [53]- [55]. On the other hand, centrally operated and managed SoS has non-stochastic behavior.
The architectural characteristics and types of SoS designed based on the level of autonomy, play a significant role in determining dynamic stochastic behaviors. This has been detailed in our research on SoS dynamic architecture modeling [7], [15]. Figure 1 depicts the complex nature of a SoS, evolving over a period of time, T as coalitions of independent and autonomous CSs, collaborating to achieve a global mission in an uncertain environment. The uncertainty and the continuous evolution of SoS increase its complexity, impacting QAs such as performance and reliability that are critical to the fulfillment of the mission. A concrete example of such a SoS is an Emergency Response system comprising many heterogeneous and independent Internet of Things (IoT) (fire monitoring sensors, drones, police and ambulance services) as CSs, collaborating to achieve specific missions in the event of natural disasters or calamities (floods, hurricanes, and wild/bush fires incidents). However, the success of the mission and conformance of QAs in resulting coalitions is uncertain since CSs which are unknown may fail or mission may be compromised due to the unpredictability of CSs collaborations. Therefore, a SoS must be designed carefully early in its life cycle so that it can deal with the underlying SOS architectural complexity and minimize design bottlenecks.  that it provides to external environment and interacting CSs for collaboration.
• I = (I j ) where j = 1, . . . , m and I is a set of interfaces for interactive transition relations (IR) = S i × S j among independent CSs.
• C H : IO CH → I , O, mediation for IR as input and output channel with order sequence or in parallel communication.
• G B : Global behavior are formed with the interaction of independent CSs L B as a result of S i × S j interactions.
• R IB : All the E I interactions and L B are essentially random generating G B stochastically. In this definition, a SoS M with architectural elements is a non-linear system, integrated with CSs such as IoT and CPS in social-technical and scientific contexts [10], [56], [57]. The architectural design of such complex systems needs to be described stochastically with underlying reasoning capabilities to avoid failures.

B. STOCHASTIC SYSTEMS
SoS is a stochastic system that exhibits random concurrent actions where the CSs interactions lead to the probabilistic distributions. The CSs interactions are uncertain for future state reachability and primarily exhibit the properties of a Markov process. A stochastic process is a collection of random variables at time T with each t as: (X 1 , X 2 , . . . , X n ) with a function of: t 0 < t 1 < t 2 .. < t n , . . . , < t n+1 in the form {X 1 (t 1 ), X 2 (t 2 ), . . . , X n (t n ) . . . X (t n+1 )} and collectively represented as: A stochastic process is discrete if {X (t) ∈ T } is observable at distinct points T ∈ N + or it is continuous if T ∈ [0, ∞). In this research paper we selected the continuous time Markov process for modeling stochastic SoS.
Definition Markovian Process: A system acts as a Markovian process with a series of random variables if system states have a probability distribution as: P(X (t 1 ), X (t 2 ), . . . , X (t n )) = P(X (tn+1) | X (t i )), t > 0. In a stochastic process, the next state of a system can be described from current states of the system going from time t to time t + 1 which is t with time homogeneity property with transition from i to j we get: Definition Memory-less Property: At a given time t n the state x n of the system with probability P is independent of all previous states and dependent only on the recent one; i.e. x n−1 at time t n−1 . This leads to a stochastic process exhibiting Markov property of being memory-less. Formally we can define this as: where: • S is a set of finite states with discrete time or continuous time.
• P is a probability of moving from state s to s . With its memory-less property, the system's behavior can be predicted with the current state excluding the past states.

1) STOCHASTIC INTERACTIVE FORMALISM
For modeling stochastic systems, Stochastic Interactive Formalism (SIF) originating from SPAs is the most suitable formalism for modeling and reasoning about probabilistic behavior and non-determinism [25], [58]. A SIF-based formalism leads to the formation of Markov labeled actions. These labeled actions with transition probability are represented as a set of actions: ∈ ACT as observable actions and unobservable actions. Observable actions are external actions of CSs through which a CS interacts with other CSs to achieve its objectives, for example, using public interfaces to send and receive messages. On the other hand, unobservable actions are internal control events of a CS through which core actions are managed. Examples are a CS reading and writing of data internally; such actions are usually private and concealed from other CSs. With a finite distribution of states S we obtain a distribution function as −−→ S with λ being a general probabilistic random rate and P is the probability that the transition of states will occur.

2) LABELED TRANSITION SYSTEMS FOR CONCURRENT PROCESSES
A stochastic system is essentially a concurrent process P 1 P 2 , . . . , P n that forms Labeled Transition Systems (LTS), the behavior of which depends on the interactions of the processes and the environment as: s (α,λ) −−→ s and forms a transition relation T r . By generalizing transition relation with P(A) we get: Here P(A) represents rate of action λ for every a ⊆ A representing transition probability from s to s and T r is the transition relation over states.

States Transitions and Paths:
The state transitions of components performing actions with certain timed rates can be traced on a particular path. Based on the execution traces of states, an infinite path is an infinite set of traces as: Path(T r ) = {(s 0 , a 0 , t 0 ) , (s 1 , a 1 , t 1 ) , . . .} so we obtain: −→ s n where t ∈ T r > 0 and a ∈ ACT and path is π such that ∀i > 0, R (s i , s i+1 ) > 0. The finite path is a sequence of traces from s 0 → s n with finite traces of execution with absorbing states. s n is the absorbing state for the system such that ∀i and T r (s n , s n+1 ) > 0. A particular path depending on the next state can be finite or infinite with the traces in a state space.

3) MARKOV CHAINS FOR CONCURRENT SYSTEMS
A stochastic system that has continuous/discrete state transitions in real-time is termed Markov Chains for Concurrent Systems (MCCS). Every transition in MCCS is associated with a rate or probability that shows the time it takes or the probability of moving from state s to the next state s i leading to the exponential distributions of state space for system behavior.
Definition: At a given time t a MCCS is a tuple of the form N =< S, s init , λ R, ACT , P i, j , π > where: • S is a state space and s init is the initial state. • λ is the action rate or probability value for interaction among stochastic process.
• R : S × S → R + > 0 is a transition rate matrix.
• ACT is a set of actions as defined above.
• P i,j is the probability P(s, s ) of outgoing transitions from s to s .
• π is the path for exponential state transitions of the system. A path π of MCCS is finite or infinite consisting of states π(S n , S n+1 ) for all n ≥ 0. From MCCS we can derive Continuous-Time Markov Chain (CTMC) and Discrete-Time Markov Chains (DTMC) models substituting λ with random rates (with r for λ and probability values) respectively. However, when these Markov models designed with process algebraic capabilities are coupled with LTS, more meaningful architectural descriptions for SoS can be specified.
A system can have reachable states if there is a finite path from s to s . Figure 2 shows CTMCs with 2 states in 3(a), 4 states 3(b) and 3 states in 3(c) respectively with their respective paths. Here, states shift from the current state to the next state with actions ACT = {a, b, c, d} and the action rate is λ. The path for CTMC exhibits the race condition between the processes with origin state s and successor state s and the rate is R(s, s ) > 0. The probability of moving from state s to s in time t is defined as 1 − e −R(s,s ).t for time spent in each states, and the movement from s to s' in a single transition is called exit rate E(s): s∈S R(s, s ).
Throughout this paper, we use Markovian process formalism but extend and constrain it so that SoS can have specification reasoning capabilities for architectural modeling.

C. TEMPORAL LOGIC AND VERIFICATION
A stochastic model M with the characteristics of a Markov model can be verified against certain properties with temporal logic based on assumptions (systems behavior) and guarantees (properties of the system behavior) to be conformed [59], [60].

Definition: Temporal logic is defined on the LTS with tuple
• M is stochastic model to be verified. • AP is a proposition alphabet or a combination of atomic propositions (APs).
• L(f): S → 2 AP L(f ) is a labeling function which attaches labels to the states. Here (AP, L(f )) are used for specifying properties and testing whether M satisfies certain properties as (M | ) with | being the satisfaction relation over logical proposition . Various Boolean logic operators (∧, ∨, ¬, →, ↔) are used for constructing propositional logic formulae using propositional logic. Furthermore, these can be used to check the states and paths of the stochastic model.

2) BRANCHING TIME LOGIC
A Computation Tree Logic (CTL) 4 formula uses Branching Time Logic (BTL) and can be represented with state formulas and path formulas with the following specifications: Where ∃φ (there exists) represents a path of state(s) that fulfill φ and ∀φ (all paths) are satisfied by φ. It does not hold with ¬ and ∨ meaning that either one of them satisfies the relation.
For path formulas we have φ::= X | ∪ , X is to ensure next states satisfy state properties. Here ∪ stands for until indicating is true until the is true. The probability for property φ and reachability of state s from s 0 satisfying a path π can be specified as: Using these base logic propositions, various steady-state and transient analysis can be performed on stochastic models based on BTL formulas as PCTL and CSL specifications [61].
PRISM is a formal language integrated with a model checker, simulator, and system analysis sub-components. It supports symbolic state-based model verification for various stochastic models including CTMCs/ DTMCs, Markov Decision Process (MDP) and Probabilistic Time Automata (PTAs) [43]. These models are analyzed with property specification languages such as CSL / PCTL, LTL, and CTL for predicting system properties [62]. In our case, we use PRISM for stochastic model verification generated from our proposed formalism HSF. The PRISM modeling constructs consist of modules and variables equivalent to CSs as components and their behavior transition from state to state as concurrent compositions in the form of CS 1 CS 2 CS 3 , . . . , CS n .

2) FORMAL DEFINITION OF PRISM MODEL ELEMENTS
The mathematical foundation of PRISM is based on Alur's Reactive formalism [63], [64]. The semantics are defined as the compositional arrangement of modules in algebraic form for the interaction process. Definition: The core elements of PRISM are stochastic concurrent processes which can be defined as tuple in the form W = V , Vinit, G p , T R , C, M where: • V is a collection of local variables (LV ) and global variables (GV ).
• Initial variables are represented as V init . • G p is a set of guard predicates applied to guards for transitions to occur if predicates are met.
• T R is a Transition rate matrix R: V x V → R ST P 0 that results due to updates in the variables.
• C represents commands []guards → S v resulted from G p , T R and corresponding updates.
• M represents the set of modules m interacting stochastically in a concurrent manner. The module consists of local variables and commands.
The general syntax elements for PRISM language is provided here: The guard commands determine the system behavior with state transitions of local variables. When the command starts with an action represented by ACT as a parallel composition of concurrent modules, the transition from the state is recorded as an update if the guard predicate is true for local variables. The stochastic information is presented with (stochastic value) S v that could be either probability P value if the model is DTMC/MDP, and random action rates λ if it is CTMC.

IV. STOCHASTIC ARCHITECTURE MODELING AND VERIFICATION APPROACH
This section presents an overview of the proposed approach for the modeling and verification of complex SoS architectures, as described in Section IIIA. Figure 3 depicts the proposed approach consisting of four core stages, starting from stochastic formalism specifications, leading to stochastic model development, and then transformations and model verification through model-checking. Each step involved in the proposed approach is described briefly below. Hybrid Stochastic Formalism: At first, we integrate current process algebra with randomness, concurrency, synchronization, and concurrent constraints operators as a part of our proposed formalism as HSF. To provide syntax and semantics, we extend SPA with specific CCS, CSP, and CCP [65] operators into our proposed HSF, providing a compositional vocabulary for SoS architecture reasoning. HSF syntax and semantics are defined to establish formal foundations for modeling and analysis inspired by CML and Sosadl [21], [49]. This enables us to specify SoS architecture as a stochastic model to express concurrent compositions with probabilistic choices and non-determinism in CSs. A multi-labeled transition system with random actions LTS R enables the modeling formalism to generate probabilistic distributions of the stochastic model M with execution state space S for the collaborative, dynamic behaviors of SoS. An EBNF 5 is generated from HSF syntax and semantics to orchestrate SoS architecture at an abstract level.
HSF-SoS Model: Secondly, we use stochastic architectural specifications using the HSF-driven EBNF, incorporating extended syntax and semantics for SoS. The structure and behavior of SoS are specified as CSs and mediators with ports 5 HSF based syntax that generates the Extended Backus-Naur Form (EBNF) establishing process algebraic rules. and roles by applying environment constraints to manage uncertainty. This enables us to generate SoS coalitions, which are stochastic and can be further reasoned for qualitative and quantitative analysis for rigorous evaluations of architectural models of SoS.
Stochastic Model Transformation: Model transformation is performed at this step. The HSF model can be treated as CTMC, which provides the abstraction for specifying SoS architectural elements and their interactions. However, the prediction of stochastic behaviors and their ability to achieve missions and conformance of QAs requires stochastic model verification. Therefore, the HSF CTMC model is transformed by proposing formal transformation rules in compliance with PRISM semantics. The formal rules allow automated transformation, enabling the analysis of the stochastic model from the HSF model. The one-to-one mapping is done by specifying formal rules for HSF and PRISM, which are compatible with both types of formal descriptions for modeling SoS.
SoS Model Verification: In the last step, a transformed stochastic model from HSF into the PRISM model-checker is used as CTMC model to verify the SoS stochastic architectural specifications. System properties are defined in CSL, in a unique way using known and unknown bounds for the evaluation of SoS missions and associated properties along with time-bounded logic specifications. Quantitative verification and predictions are made by applying relevant algorithms using CSL transient and steady-state analysis based on reachability and numerical computations [66].

V. HYBRID STOCHASTIC FORMALISM FOR SoS
This section provides the extended syntax and semantics devised for formulating proposed HSF including abstract SoS architectural reasoning and coalition behavior. At an abstract level HSF for SoS is heterogeneous aggregation of concurrent processes P, actions a ∈ ACT , random rate of action λ/(r) 6 ∈ R + and Channels C forming a tuple as < P, a, λ, C >. A process P engages into action a as (a.P) with a probabilistic distribution over a time T with action rate as (a.λ).P which determines exponentially the duration or delay of actions. In order to constrain the interaction and deal with the uncertainty of exogenous interactions of CSs, we add concurrent constraint store operators annotated with random rates of CCP that enables the stochastic semantic to constrain the environment.

A. SYNTAX
To compose SoS behavior and structure, we have used basic combinators defined in PA. The notion of a process or agent is retained with CSs where every CS is an independent process P ranged over A, B, C,.. as Names. Names are used for processes, and channels are used for communicating data between processes. We formally define syntax for HSF-SoS 6 Symbols λ and r are used here interchangeably to represent random action rates which lead to exponential distributions for system interactions.
collaborating as concurrent processes: The detailed description of each syntax element is as follows: • Skip: This indicates that a process has been successfully terminated.
• (a.λ).P: This is an action prefix operator that presents a process P, performs an action α with activity rate r, and then again behaves as P. Various behavioral operators are used to determine the communication among CSs processes to interact and form Coalitions. This communication is categorized into sequential, parallel, and choices.
• P (choice) Q: This shows probabilistic external and internal choices between P and Q. With probability P, it behaves as P and as Q with probability P − 1 where probability choice P ∈ [0, 1] for choosing processes. The choice of processes is ND.
• P Lr Q: Parallel composition occurs for multiple events involving participating CSs. Actions could be both synchronous and asynchronous. For stochastic processes, parallelism is interleaved with a cooperation operator where the r operator represents a list of actions that can be both hidden and silent.
• P u Q: It runs as CS as P if P successfully terminates within a given time unit u; otherwise it behaves like Q.
• Tell.λ(Cstore): This term tells the SoS environment about the new constraints imposed on a CS and added to the constraint store non-deterministically over a time interval index i with a probabilistic distribution.
• Ask.λ(Cstore): This term allows the participating CSs to derive certain information from constraints stores randomly over a time interval index with a probabilistic distribution.
• A ∼ = P: A is a constant that assigns stochastic behavior for process P.

B. SEMANTICS
The semantics of the above grammar can be established by a combination of axioms and inference rules for syntax operators. To build an LTS, the behavior of the system is derived via meaningful assertions. The inference rule (P,α,P )→ S × S is a result of transitions P → P with s → s ∈ S for process P i and P j participating in actions α i .

Definition Transition System with Rates (LTS R ):
The LTS for stochastic process is a tuple of the form: < S , L α , T (r) → (f ) > where, S represents set of states S of the system, L α is a set of label actions, T (r) → ⊆ P × P is a transition relation and f is a rate function of the form (f ):S × L α × S → R + . This yields a probabilistic distribution for transitions of the system. Time for T (r) is t ∈ R > 0 and has VOLUME 8, 2020 function with value [0,1]. These processes can be represented with their action type and rate of action as: α(ActionType).P → α(ActionType), r(ActionRate).P The random rate r leads to the exponential distribution of system behaviors over a period of time t with probability being: f (P(a.r) − − → Q). Now, by applying to the HSF syntax the general mechanism defined above and the general rule of premise and conclusion, we obtain abstract level transition rules by: If the premise holds, then the conclusion also holds, and all the rules are symmetric. The transition relation is derived from the Cartesian product of two transitions systems say, Ts 1 and Ts 2 as: (Ts 1 Ts 2 ). It works on rules if s → s and conclusion pair of the Transition System (TS) obtained through a Cartesian product of the two transition systems leads to the formation of semantics as: where is the state space S, ACT is the set of actions, initial states are of the form s 0 and s 1 and L(f) represents a labelling function. These principles of concurrent systems transitions provide baselines for the the formation of HSF semantics.

1) AXIOM RULES
Axiom rules are encapsulated in LTS R , and are the possible representations of the terms defined in the aforementioned grammar. In the semantic rules, TR signifies Transition Rule while P, Q, P , Q represent the process involved in actions α which are transformed with a condition as: if P transforms with (α.r) to P then in return the transition relation of parallel processes (P Q) P is reached. Semantic operations based on premise and conclusion rules are as follows:

2) STOCHASTIC CONCURRENT CONSTRAINTS AXIOMS
The non-determinism of state transitions when communicating CSs and uncertainty can be managed with the application of stochastic concurrent constraints. CCP is used for the SoS model in order to constrain the interactions among CSs; it acts as a mediator to manage concurrent systems by adding new information about CSs by means of Tell() and Ask() operators. This forms a function C(f ):C(Store) → R + associating a rate for constraint store with real number: Q where < C n > is the set of constraints in the constraint store that transforms P to P . The constraints are inferred by means of operator Tell and Ask with the association of randomness as λ: Since a SoS is stochastic in nature, a stochastic process is a discrete continuous-time process of CMTC generated through LTS exponential probabilistic distribution of system state transitions. For every process, the overall rate at which actions are performed is termed r.α i (P,Q) where α and r represents action and rates respectively. From these semantic transition rules, a stochastic model of a SoS can be treated as a multi-labeled transition system leading to transition systems Ts = T r1 T r2 . . . T rn interacting independent systems combined and generalized we obtain: where CS is the set of constituent systems, A αi ∈ ACT is the set of activities / actions (a.r) and the multi-transition relation is represented as Ts. Such systems behave in a ND fashion when an action is performed that affects certain interacting systems as shown in Figure 4. By expansion law for interleaving semantics and memory-less property, interleaving parallel composition in the form of (P1 P2).λ states that reachability is the λ delay time rate with exponential distributions. Suppose that we have two CSs as P and Q where P has action α.λ and Q has action β.λ, and these form a parallel composition, e.g. (α.λ β.λ) ∼ =(α.λ,β.λ+β.λ,α.λ). By means of an interleaving operator, such processes choose actions non-deterministically as shown in the transition system in the figure below. Definition Behavior Transitions: Above LTS R leads to a Markov model with the behavior transitions of a CTMC in the form of a tuple < S, R, P i > with S being finite set of states, R is the rate matrix with a function R : S × S → R + ≥ 0 and P i is the probability distribution at the start P i :S→ [0, 1] such that it yields to s ∈ SP i (S) = 1. Formally it can be represented as rate of moving from state i to state j in T s as R(i, j) → R(s, s ). Then with the probability of moving from state s to s at an exponential rate, λ we get R(s.s ) = λ.P(s, s ). If there are many possible states s, then it will transition to next state s with the shortest time, known as a race condition. The exit rate of all outgoing transitions is denoted by E(s) and this can be generalized as the total exit time from a particular state: The  (4): Definition States transitions over Continuous Time: The dynamic behavior of system execution in its state space S can be traced and analyzed using a rate matrix Q depicting the rate of constant movement between states. Moving from state i to j depends on current states and it ignores past states that, essentially, are memory-less property of stochastic processes. For all P i,j for (i = j) and (i = j) the formal notations for the infinitesimal generator matrix are: The transition diagonal matrix is in the form P[i, j] that yields J =i R(i, j) with 1 ≤ i ≤ n, then and CTMC transition matrix is formulated as: Using row column notation with diagonal elements q i,j = − j=i q i and from (6) Q the matrix is formulated showing state transitions resulting from T s of (3): where a 1 a 2 , ..a n are individual state transitions of the form s 0 , s 1 . . . s n and a ⇒ λ are the diagonal elements occupying exit rates E(s). The probability of moving to the next state j is provided by the time unit t from (7). The Q matrix allows system designers to compute a system's steady-state and probability of transient states with the help of vector P . P vector is used to indicate the probabilities of a system being in the initial state. It is defined formally as: Here, s 1 , . . . , s n indicate the probabilities of a system in state s i at time t i . If there is no change in long-run steadystate then by applying limit it yields to P i = lim t →∞ P i (t) and P i = 0.
The system of linear equations is a product of Q matrix and vector P that enable system behavior to be explored as will be explained further in subsequent sections. By applying various algorithmic combinations to Markovian models, a range of analysis and predictions can be performed on complex SA using stochastic propositional logic.

C. ABSTRACT ARCHITECTURE LEVEL REASONING SEMANTICS
Here, we provide the core of the semantics at the architectural level for CSs and mediators, as depicted in Figure 5. These semantics enable architectural level reasoning and establish the basis for performing further systems analysis.

1) CONSTITUENT SYSTEM SEMANTICS
CSs interfaces are paired with (Port, Role) association, with each port assigned a role by mediator. Here, port P describes the external, exogenous behavior of every CS in association with Role R defined by mediator M. So for every role of mediator M there is a port association that forms a binding protocol (B p ) as: The behavior of the CSs is constrained by protocols in the form of contracts among ports for information exchange. The participating CSs can request and reveal certain information to the SoS environment at random rates to cope with the uncertainty. Here, the SoS has no control of the internal behavior of CSs, so it is important to agree on some contracts VOLUME 8, 2020 through which architectural elements can collaborate and exchange information.

2) MEDIATOR SEMANTICS
Similar to CSs which have certain interfaces and ports, Mediators have certain roles R and Bind Protocols B p that help to coordinate with CS interfaces i.e. ports and this facilitates the accomplishment of tasks. The coordination of these events is done through B p in a specific order. Hence, it is clear that interactions roles R and B p are parallel actions enabling CSs to communicate. So we obtain: B p R 1 R 2 ,. . . , R n B p = n i=1 R i Every role that the mediator defines for a system is assigned a relevant functionality that is to be achieved during system execution. From above equation, Joint protocol is a sum of all possible actions relevant to the specified roles of CSs. The resulting behavior of the coalition is assigned to G B at line 10, which is a result of initial and new states transitions satisfying stochastic Concurrent Constraints (CCs). The last line ensures that stochastic behavior is a result of the local actions of CSs. Algorithm 1 constrains SoS concrete architecture coalition behavior generated from proposed HSF. It brings the architectural elements of SoS together to generate global behaviors with probabilistic distributions using underlying HSF semantics. The stochastic behavior is generated as a CTMC Model, which is used in PRISM for SoS model specification.

Algorithm 1 Stochastic Model Behavior
for all t ∈ R do 5: The HSF is then integrated into EBNF to generate a stochastic Domain-Specific Language (DSL), 7 which is used for describing SoS architecture using the MDE approach. The meta-models are established using grammar rules to create a high-level DSL. For this purpose, an EMF 8 -based, Xtext 9 approach is used that allows us to parse, build and translate the internal code into an external DSL for architectural representation. A high-level DSL allows the system designers to describe the architectural elements of SoS with stochastic reasoning by emulating the underlying syntax and semantics within meta-models. Figure 6 depicts the EBNF rules based on HSF semantics using Xtext for SoS CS behavior specification. The CS definition starts with its name, state, and ports declaration. The behavior comprises exogenous actions with random delay rates and stochastic constraints using tell and ask operators to deal with uncertainty. CS transition rules are defined with the set of states consisting of events with rates. This leads to generating probabilistic distributions of CSs when interacting with other CSs that have collective dynamic transitions. Similarly, grammar rules for mediator behaviors and abstract SoS architecture are defined to allow system architects to describe SoS architecture using simple architectural notation. This hides from the system designers the internal complexity of underlying formalism and provides flexibility to describe SoS models, which can be reused in the future. The generated DSL based on HSF provides a formal stochastic basis for the qualitative and quantitative verification of SoS architecture.
This section makes significant contributions to the body of knowledge regarding the modeling of SoS stochastic structures and behaviors. The underlying semantics of HSF, bring reasoning capabilities among CSs events with probabilistic choices, race conditions, and conditional synchronous compositions to generate model behaviors. The SCCP operators are used to deal with uncertainty in SoS architectural models at the interactions level.

VI. CASE STUDY-EMERGENCY RESPONSE SYSTEM AS CPSOS USING HSF
Architecture modeling and verification by means of the proposed approach are conducted through a case study design of a mission-critical real-time Fire Monitoring and Emergency Response SoS (FM-ERSoS). It is a part of a smart city project that allows various departments and entities to collaborate, particularly when dealing with emergency situations. The emergency system is inspired by smart city projects to manage disaster situations by encompassing modern IoT and CPS technologies [67], [68] as CPSoS (consisting of various IoT-based CPS nodes and third party independent CSs). emergency response SoS, in collaboration with remotely distributed software-intensive systems, deals specifically with sudden fire eruptions in urban and rural areas with continuous monitoring.
Due to its architectural characteristics, FM-ERSoS exhibits ND behavior with exponential distribution comprising random action delays. Therefore, it is a challenge to design such a critical system that dynamically evolves to achieve missions [57], [69]. Essentially, it is a collection of various independent CSs as CPS nodes and embedded physical resources enriched with computing logic and interconnected and able to sense data from the environment and subsequently make a decision [70], [71]. Each CPS node is equipped with different sensors, connected through Wi-Fi Networks and Wireless Protocols (IEEE802.11 Wi-Fi and Zigbee). Various CPS nodes further collaborate through local gateways capable of performing the necessary processing.

B. STOCHASTIC ARCHITECTURAL SPECIFICATION
Specific to FM-ERSoS, Figure 8 shows a CS architectural specification with HSF-based semantics as outlined in Section V. CS, named CPSFS, is integrated locally with fire and smoke IoT sensors as a complete and independent system. This CPS node is responsible for predicting/detecting fire in the early stages, performing the necessary measurements, and transmitting these to the nearby CPS. This CPS sends fire data in real-time as sensors detect smoke and flames in the environment. Figure 7 depicts the coalition of FM-ERSoS, which generates behavior with different CSs through the constrained coordination of mediators.
In order to deal with unexpected wild-bushfire, the system consists of multiple CSs nodes working as CPSHWF, CPSFS that enable the detection of fire and send data in real-time to LCS. In response, the LCS nodes issue immediate emergency and warning alerts to the ECU. Starting with the CPSFS, as aforementioned has ports providing interfaces, namely send-fire-data and receive-fire-data, at random rates. The random actions with delays lead to the exponential distribution of system state transitions. The mediator named M as WSN coordinates fire monitoring data with specified roles of data communication using the CS port protocols. Here, the sending and receiving of fire data are achieved with corresponding CSs ports signatures and roles. The second mediator defined here coordinates with the fire monitoring nodes CPSFS and CPSHWF which continuously gets data from fire areas in order to monitor the real-time situation. To manage uncertainty, the mediators and CSs exchange exogenous external information through stochastic concurrent constraints via the Tell and Ask operators.

C. FM-ERSoS MISSIONS AND QUALITY ATTRIBUTES
FM-ERSoS has two main missions namely M 1 and M 2 . The first and by far the most important mission M 1 is to detect the fire events as soon as possible and send messages to ECU.  The second mission M 2 provides prompt emergency rescue services through warnings and alerts via the coordination of LCSs and ECUs. The scenarios with missions and sub-goals are depicted in Figure 7 and explained in the next section. These are based on disaster management and emergency response operations considering core QAs [68], [72].

1) SCENARIO-A
In the first scenario, the SoS with CPS nodes tries to achieve M 1 with sub-goals G i of monitoring the fire and prediction of real-time events as G 1 M 1 . The IoT nodes comprising humidity, heat, and wind-flow sensors, predict the possible fire event occurrence and generate messages through LCS gateway stations to the ECU. Similarly, for G 2 M 1 , CPS nodes equipped with fire and smoke IoT sensors observe fire events in the area and send information to the ECU through LCS in near real-time without failures. Wind flow sensors also work in conjunction with the detection of fire to provide information about developing fire directions and flows.

2) SCENARIO-B
This scenario is connected to the previous scenario and extends the global mission as M 2 . In this mission, once the ECU receives information about fire events from CPS sensor nodes and remote nodes (satellite and drones), it starts processing them continuously in real-time and further collaborates with CSs such as police and rescue services, for immediate evacuation and provision of emergency services. It also collaborates with IoT nodes to generate timely signals to contain the fire spread by sending warnings and alerts notifications to fire-fighters.

D. REQUIREMENTS FOR TRANSIENT AND STEADY-STATES
Based on the scenarios A and B, we have identified core system QAs in relation to performance and reliability, and similarly steady-state requirements for the achievement of SoS missions.

1) PERFORMANCE REQUIREMENTS
Two Performance Requirements (PR) for the above scenarios are: 1) latency (events response time), and 2) throughput (number of alerts, warnings sent) per time unit, respectively. Related performance requirements are: • PR 1 : What is the probability of data delivery among CPS-nodes, i.e. (sending fire prediction data, receiving sensors data) within time T unit of seconds?
• PR 2 : The probability that the emergency control unit generates alerts and warnings to first emergency responders within T time unit of seconds after it receives messages from the control station.
• PR 3 : The probability that smoke and fire data will be delivered to the local control station within the first 10 seconds is greater than 0.90.
• PR 4 : Fire data will be sent to the ECU and LCS from CPS nodes sensors in less than 70 seconds with a probability greater than 0.80 with more than 50.

2) RELIABILITY REQUIREMENTS
Reliability R(t) of SoS is the likelihood that most of the CSs will be working until the mission has been accomplished at time t. SoS missions execute with continuous time, t predictable failure rates i.e. R(t) = e − t. Core Reliability Requirements (RR) are: • RR 1 : Likelihood that CSs will be able to complete actions when one of the CSs is degrading (leading to failures) within Time T unit of seconds.
• RR 2 : There is less than a 50% chance that both CPSFS and CPSHWF will continue to send data successfully.
• RR 3 : There is more than 50% likelihood that CPSCLS may not send alert messages to the ECU in real-time.

3) STEADY-STATE REACHABILITY
Since the system usually executes on a longer period of time starting from the monitoring of fire to send alerts to the first emergency responders, it is vital to measure steady-states which depict the long-run behavior of the system. We identify few very critical Steady-State Requirements (SSR) vital to system success as follows: • SSR 1 : The long-run likelihood that various CPS nodes will be able to send the fire situation data from source nodes to nearby nodes?
• SSR 2 : Steady-state probability that Local control situation will not be able to send data and warnings to nearby nodes and ECU?
• SSR 3 : What is the long-term possibility of success that mission M 1 or mission M 2 will be successfully accomplished?

VII. MODEL-CHECKING SoS USING PRISM
Using CSL, model-checking algorithms based on BTL automatically verify various system states to stochastic behavior and quantitative measures of SoS properties with path and state formulas. However, this creates the problem of state space explosion with the exponential growth of state space [62], [73]. Therefore, it requires a large number of computational resources and time, adding to the complexity. SMC methods try to solve the state-space explosion problem by approximating the sample states using state-space reduction methods [15], [74].
Although there are many SMC tools including UPPAL SPIN and PRISM to verify certain types of probabilistic models [43], [75]. Among these, PRISM offers the features required for the analysis of various types of stochastic models and can optimize the use of resources by state space reduction [43]. Therefore, we use PRISM as our stochastic model verification platform for performing structural and behavioral analysis of SoS. The model specified in HSF is formally transformed into PRISM as the CTMC model, which is further reasoned and analyzed for conducting the architectural analysis. The workflow and core analysis and evaluations to be performed are depicted in Figure 9. SoS Requirements are specified by the system designer for modeling SoS architecture. The stochastic model in HSF transformed through formal rules into the PRISM CTMC model is readily available for various architectural analyses. The SoS CTMC model is analyzed with properties specifications using known and unknown bounds to predict mission success and QAs measurements to validate the stated requirements. Results are evaluated considering requirements based on which the SoS architecture model can be refined.
A. MAPPING RULES To analyze the system architecture from HSF (syntax and semantics defined in Section V) quantitatively, assess the likelihood of SoS mission success and prediction of QAs in PRISM (defined in Section IIID), an automated transformation approach has been adopted with mathematical principles of mapping. Formally, an architectural model M which is a stochastic model, can be transformed where each member of M is mapped to each modeling element of S: M → S. The following mapping rules have been defined for formal translation from HSF to PRISM.
CS Initial State and Local Variables: The behavior of SoS starts with an initial state in HSF as CS i ∈ CS as a constituent system that corresponds to local state and is represented as Local Variable LV ∈ V in PRISM and both eventually represent initial state s i ∈ S. We form the rule: CSs and Modules: Every module and CSs go through a transition with a labeled action with delay rates r: ↔ s i ∈ S Each CS and module M also represent state s i as a result of stochastic action taking place. We can formally represent the CSs' transition w.r.t command in PRISM, which is a result of guard predicates and the transition rate matrix R with input and output (IO) ports (P in , P out ) of interface I : Constraints across coalitions and systems: Constraints are related to overall coalition behavior and QAs that correspond to commands, which must be valid in terms of respective guards and updates in PRISM modules: Where Col represents a coalition, G B is a set of global behaviors and Q A for QAs. Similarly GV with commands (C) and rewards (R) are true.
Formally founded rules defined are complete with respect to their compatibility with underlying semantics of HSF and PRISM. A summary of transformation rules is given in Table 1, which contains their descriptions and HSF, PRISM elements aligning one-to-one corresponding through mapping rules. These formal rules enable the automated transformation of the HSF architectural representation to a PRISM CTMC model.

B. MARKOV MODEL AND VERIFICATION SEMANTICS
The model specified as CTMC requires formal property specification; i.e. a model checker needs to verify that a stochastic model conforms to specified properties according to a certain logic. Before defining the logic specification, we redefine the CTMC with model-checking perspective.
Definition: In a model-checking perspective CTMC is defined as a tuple of the form M =< S, S init , Q, E(s), AP, L(f ) > where: This can be expressed as the probability of moving from state i to j with transition rate matrix R: A CTMC model with finite set of states s i ∈ S forms a path π with time transitions t ∈ T = (s 0 , a 0 , t o ), (s 1 , a 1 , t 1 ), . . . , (s n−1 , a n−1 , t n−1 ) with transitions rate matrix R(s, s ): Paths and states are used extensively for steady and transient states during the model-checking process overs stochastic state space for system properties verification, which we explain in the section below.

1) CONTINUOUS STOCHASTIC LOGIC
To verify system goals and QAs, the stochastic model of CTMC can be validated quantitatively using CSL descriptions. For a specified period of time or interval, the model's architecture is analyzed in terms of transient properties, while steady-state properties are specified for evaluating long-term system behaviors.
Definition: For real-time events α, AP atomic propositions, α ∈ AP, and probability bound is P∈ [0, 1], t is the time interval it takes for moving from R(s i , s i+1 ) ∈ R + > and ∈ {≥, >, ≤, <}. The logic to express state and path propositions is described as follows: are used to verify every state of the system, while path formulas trace each path. Both of these formulas are based on BTL semantics as described in Section IIIC1. Based on CTL X (Next) and U (until) are temporal operators and can derive other operators such as eventually (F, ♦) and always (G, ) for defining advanced logical formulas i.e. ♦ ≤t = true ∪ ≤t .

2) CSL PATH FORMULAS-SEMANTICS
The path formula X ≤t is true if is true in time interval t in path π against a state transition. Similarly, 1 U I 2 , tends to be true if 2 is true in time interval t and 1 also holds on a path execution. Set of paths in M is Path M (π M ) therefore, s | assure that state s satisfies property and Path π M : π | satisfies property . We have transient and steady-state formulas: s | P p [ ] iff is satisfied in path π starting with state s ∈ S ∈ p and π | S p [ ] iff is satisfied in long-run for state s. The steady-state probability of model P S (M ) over the the path π M from (6) and (7) with matrix Q and vector P are calculated respectively. Therefore, we obtain the initial path π 0 (α, s ) as: π(M ) = P{S = s |S = (s, t, s )} = lim t→∞ (s, t, s ) (10) By applying the system of linear equations, we assume π i (α, ∅) = 0 to compute steady-state probabilities: π(M ) = P(π(s).Q = 0, s π(s, s ) = 1 (11) Transient probability is the measure of the probability of a system being in a state s for a particular time interval t s i @t on a path π of the form π si @t; therefore, from (6) and (7) we obtain: System performance and reliability properties are expressed through CSL using (10) and (11) for CTMC models using combinations of CSL operators. Here, we focus on time-bounded verification of properties using 'until' and 'next' operators. The performance and reliability can also be predicted with unknown probability bounds. From (10) and (11) we have general property specification formulas as follows: The quantitative measures for the Markov model of a complex system M can be determined for transient and steady-states overs paths from (13) and (14), respectively, using stochastic logic propositions.

3) BOUNDS: KNOWN AND UNKNOWN
To deal with non-determinism, we adopt a unique approach for the verification of a stochastic model using known and unknown bounds for system properties. CSL allows reachability analysis, also known as Qualitative Reachability, with Known Thresholds (QKT) values to determine if a certain state(s) of the system is reachable with probability ratios. Bound p can be used with relational operators i.e. P [0,0.7] ( ) for P ≤0.7 ( ) to check whether certain thresholds of system properties are met.
Moreover, in CSL, transient and steady-state probabilities can be quantified by leaving the bounds unspecified. Hence, we can define properties in the following way: P =?[ ] → (unknown transient probabilities quantification of properties) S =?[ ] → (unknown steady-state probabilities quantification of properties). For example, to predict a future state that a stochastic system may reach in a given time interval t with an unspecified probability bound, we apply as: The model-checking of system behaviors with unknown bounds is called Quantitative Reachability with Unknown Thresholds (QUT).
System Qualities with Rewards: The system model receives a reward for each time instance that it spends in a particular state s. For the reward specification, we use the general formula R x ( ), R being the reward amongst a set of paths π with starting state s that satisfies a bound x. From this the commutative reward C and instantaneous reward I r are formulated with time unit t as:

VIII. VALIDATION
The overall process for the verification and validation of the stochastic model is depicted in SAM-SoS in Figures 3 and 9 which show the schematic flow of the approach. The experimental results are checked and verified against stated properties for verification and predication as outputs of the validation process.

A. STOCHASTIC MODEL IMPLEMENTATION
Based on the transformation rules defined in section VIIA, we have mapped the stochastic SoS CTMC model (FM-ERSoS) from HSF into PRISM semantics. The model consists of abstract CSs forming coalitions at runtime primarily based on Algorithm 1. The stochastic CTMC specification based on the architecture of the SoS model is presented in Figure 10 with possible states and CSs to achieve SoS missions. All the modules (CSs) interact with each other through actions ACT transition rates T r . The complete model specifications of the transformed FM-ERSoS model in PRISM is available in the link. 10 The model starts from state 0 and moves to multiple states. We assume states {1, 2} and {5,6} represent the CPSHWF and CPSFS nodes respectively. Transitions occur with real-time events and action delay rates. LCS is represented in state 9 as the critical state (there are other possible states for this node, but for simplicity, we have chosen this state). The system reaches to state 10 for completing mission M 1 (Generate early warnings and Transmit data to ECU). ECU with states 12 and 13 generates alerts and early warnings to first responders. The absorbing state is 15 and indicates the achievement of mission M 2 . The parameters used for the model are given in Table 2. N represents system constant, t r is transition rate, and these rates are extracted by examining the similar type of complex real-time systems [45], [76]. T is the time unit used for a real-time system for model-checking at runtime while individual numbers of CPS nodes are described with minimum and maximum variables.

B. VERIFICATION THROUGH TEMPORAL LOGIC CSL
By means of CSL, APs are checked for state properties using steady-state and transient analysis via (13) and (14).

1) STEADY-STATE PROPERTIES
Steady-state CSL properties help to verify the long-term behavior of the stochastic FM-ERSoS. By using (14), we formulate steady-state specification as: is for an AP that could be a single state (Ss) or composite state (Cs) and S be the steady-state operator.

2) TIME BOUNDED TRANSIENT PROPERTIES AND REWARDS STRUCTURES
To determine how different system QAs perform while achieving certain functionalities in SoS, we use transient CSL properties with time bounds. CSL has two variants of transient time-bounded properties as time bounded next with operator (F) and time bounded until (U). By using (13) we obtain: Here , 1 and 2 are APs for paths states while 't' is the time interval, q is the given threshold value, and the next operator (F) may contain single or composite states as APs. Single state is preferred for the 'until' operator (U ) especially when monitoring a fire from the prediction state to emergency management since CSs are operating in an unstable and volatile environment. Finally, time-based reward structures are used in the model to reason about certain performance and reliability requirements. Following is the general specification for state-based or transition-based rewards verification: Relevant requirements for performing analysis, their description, and logical specifications are presented in Table 3.

C. RESULTS AND DISCUSSIONS
All the experiments were conducted on Intel(R) i7-7700@ 3.60GHz with a RAM of 16 GB. To perform verification and simulation, a hybrid engine was selected for PRISM using JAVA client as a runtime environment. The maximum iteration for termination was 10000, the probability threshold was established as 1.0 x 10 −5 , and hybrid sparse memory was up to 1024 kB.

1) SYSTEM REQUIREMENTS VERIFICATION THROUGH QKT
We start our experiments by checking the QKT model, which involves qualitative verification of specific system properties. Since the threshold values are already known, results are either satisfied or not satisfied. Table 4 shows the corresponding requirements, CSL logic predicates, and associated outcomes. Most of the requirements are satisfied. However, PR 4 does not meet the desired likelihood of 80% which cannot be achieved in real-time due to the uncertain environment in which CPS nodes work. For reliability, especially RR 2 , results show that there is less chance that both CPS nodes will be able to send data to LCSs.

2) LONG-RUN SoS BEHAVIOR
Logical specifications for steady-state analysis are derived from (15). Results are presented in Table 5 together with particular requirements and expected values. It is observed that for the initial configuration of CSs (CPSHWF and CPSFS) the probability of sending sensors' data in the long term is quite low at 7.837e −2 . This is because both of these CSs operate in a volatile environment where there is a strong possibility of failure as the fire situation develops. The long-term likelihood that (CPSLCS) will not be able to send messages to the ECU is as high as 45%, indicating that there may be problems with data transmission in future. Once the data has been processed with nodes at the edges i.e. with ECU and a local gateway LCS, the chance of the mission succeeding increases with an expected value of 7.0962e −1 .

3) TIME BOUNDED TRANSIENT ANALYSIS WITH QUT
Fire Data Delivery-Latency: PR 1 One of the important tasks of the fire monitoring system is to measure the real-time fire events from source locations and forward this data to the nearby node in minimum time.
The graph in Figure 11 shows the different probabilities of sending sensors data from CPS-nodes to connecting nodes. The time unit was constrained to 20 seconds, while N= 20, 10 and 30. The average number of iterations performed by these nodes was 430. The minimum probability at time unit 1 for CPS nodes CPSHWF and CPSFS is 0.40, and it gradually reaches 0.99. The first two nodes had the maximum amount of response time of five-time units. This indicates that if source CPS nodes are connected well and have the right amount of bandwidth for mediator, the CSs may perform better. When any one of the nodes works with a LCS node, the performance remains moderate as in the second case where the probability that sensor data will be sent from the CPSFS to the gateway station LCS within the given time is more than 60% on average. Similarly, the probability of data delivery from gateway stations to ECU nodes increases to 0.80, indicating the change in likelihood with a positive trajectory. The average response time for the last two pairs of nodes remains at 10-time units, which are quite high. Thus additional nodes are required to reduce the load.
Alerts and Warnings: PR 2 Once the ECU starts receiving data from local gateways, it is vital that it processes the data immediately to generates the maximum number of warnings and alerts to first emergency responders, including police, ambulance services, and fire-fighters. This can be measured with the reward formula from (18). It obtains the warning data and other events data randomly from distributed LCS and sends messages after further data processing. We verify the combined rewards for alerts and warnings within 30-time units by changing  Figure 12. It reflects with increasing N system number of CSs, and it has a direct impact on the performance of ECU as it allows to process more sensors data in and disseminate the information in a timely manner. Depending on the risk areas and geographic spread of the fires, the participating CSs may be increased, starting with source CPS nodes at the local gateway station and moving towards the ECU. Reliability with CSs Degrading: RR 2 Since a SoS comprises various CPS-IoT nodes with multiple independent sensors operating autonomously, the reliability of the system is governed by the number and types of failures occurring among the collaborating CSs operating in a volatile environment. As we can see from the graph in Figure 13 for RR 2 , when CSs are collaborating, the likelihood that HWF will not be degraded is very low, ranging from (0.039 to 1.52) within time units 1 to 30, respectively. The CPS nodes of the ECU and LCS have a relatively low percentage of failure when collaborating with a maximum value of 0.129 within time t. The likelihood that the system will achieve a sub-goal (i.e. ECU can send alerts if LCSs are failing) is relatively lower than desired. During a separate experiment, it was also observed that when predicting the failure of the SoS combined with all CS failures in the long term, i.e. for the next unit of hours, VOLUME 8, 2020  the probability was almost 0.99, which is quite high. This is due to unexpected and sudden failures and environmental uncertainty. The reliability of the system can be improved by replacing the degrading CSs with redundant CSs.

4) MISSIONS AND RELATED SUB-GOALS
Moreover, all these analyses enabled us to predict the possible success values for missions and associated goals. Mission M 1 has two sub-goals: monitor and predict G 1 M 1 and transmit data to ECU and send-alerts as G 2 M 1 . Similarly, mission M 2 has a collective and time critical mission to generate warnings and send-alerts to first responders. We check the likelihood that these individual missions are achieved within time T units. The property specification logic is provided: The graph in Figure 14 shows mission behaviors during an emergency and their probability of success in time T . There is approximately a 30% chance that mission M 1 and its sub-goals will be achieved within the time of 40% success.
The probability of achieving mission success M 1 is quite high, which reaches to value 0.80 that first responders and population in the area can receive alerts and warnings in a timely manner. However, the level of reliability of participating nodes is determined by the number of failures during the mission.

D. SUMMARY OF THE RESULTS
The analysis of SoS stochastic architecture behavior in the long term shows that the model has a likelihood of failing as it continues to perform goals. The results reveal that starting with HWF, the long-term stability of CPSFS nodes for fire detection, smoke, and humidity, is expected to be lower than 50%, and there is a greater chance that at certain stages, one or all of the starting CPS nodes may fail once the fire spreads. There are varying results for the quality of attribute performance and reliability of FM-ERSoS. For example, the latency of sensor data exchange from node to node is relatively satisfactory, provided that intermediate nodes are connected. The likelihood that messages will be delivered within time T units is around 60%, which is satisfactory. This increases as the flow of data go to the next nodes, i.e. from the LCS to ECU and first emergency responders.
The FM-ERSoS reliability is below the expectations, for both the starting and ending CPS nodes. The participating nodes are prone to failures, and it is also observed that there is a strong tendency for individual nodes to degrade. There is a strong probability that CPS nodes CPSFS and CPSHWF will deteriorate with time, with low average reliability of 70%. The LCS gateway may also degrade, further impacting the flow of information while the actual fire situation is escalating. The degradation of the ECU when collaborating with LCS is relatively low as it operates in a much more stable environment. However, collective reliability R(t) decreases as LCS has a higher rate of failures. There is a possibility that more than 55% of the messages fail to reach their destination if one of the CSs is failing within time, T . Therefore, all the corresponding nodes must work effectively when collaborating to achieve missions/goals, especially in the case of warnings generated by LCS and sent from the ECU to the first emergency responders. The inevitable failures could lead to disasters; timely evacuations, help services, and rescue may be delayed as the information is lost among nodes.

1) IMPROVING PERFORMANCE WITH ADDITIONAL CPS NODES
The performance of FM-ERSoS in terms of response time for mission accomplishment can be improved with additional CSs, especially in emergency cases where human lives and critical infrastructures are under threat [77], [78]. However, additional CSs will increase the cost and may require extra resources for maintainability. We added CSs to the model in parallel based on critical suburban regions with areas of roughly 10-20km. The starting CSs, i.e. CPSFS and CPSHWF, increase in number area-wise when the LCS and ECU are moderate in number.
The graph in Figure 15 shows how additional CSs have influenced the response time in terms of individual fire data delivery. Starting from a suburb with an area of 10 square km, initially, with 10 CSs, it has 200-time units of response time, and it decreases gradually with the addition of other CSs. Similarly, suburban areas of 15 and 20 square km exhibit a similar pattern; however, the impact on response time varies for the specified area. For example, the response time for 15 CSs is 180-time units, which is quite high. In the third case, where the maximum number of CSs is 70, the response time is 55, which is ideal. One important observation is regarding the level of uncertainty: despite the addition of nodes, the overall impact is not always positive in some cases, as shown in the cases of the first and last suburbs with nodes 50 and 30, which have high latencies of 110 and 270 respectively.

2) IMPROVING SoS RELIABILITY WITH REDUNDANCY
As discussed above, the reliability of redundant CSs can be improved; however, certain design principles must be followed. The sequential addition of components might not improve the reliability since this will further be lowered if any one of the CSs fails [79]. This problem can be overcome by applying parallel reliability with redundant CSs [80]. Parallel reliability is achieved in terms of unreliability (1 − R(t)) of CPS node as: A high collective reliability R(t) value of 0.99 is achieved with the composition of many redundant nodes. However, this approach has associated cost and maintainability overheads for the system in the longer term. Due to the fire spread, initial nodes, i.e. CPSFS and CPSHWF, are prone to failures and can be very volatile. Hence, reliability R(t) is measured with the random addition of parallel CSs. Figure 16 shows the improved reliability with redundancy for the SoS.

3) PREDICTION OF MISSIONS SUCCESS
Results reveal that a SoS is prone to mission failures and, if left untested, this would jeopardize the accomplishment of system goals. At the same time, QAs metrics show that reliability and performance need to be improved upon because the values indicate that latency and throughput have a high likelihood of diverging from the stated requirements necessary for mission success. However, with alternative models and rigorous verification, architectural designs can be improved well before time to minimize defects and improve QAs.  Table 6 presents a qualitative comparative analysis of our proposed approach for modeling and verifying SoS architectures with related works. The analysis is performed based on the capabilities of approaches to support the required features: (i) modeling (system type, formalism, structure and behavior, stochastic behaviors, and uncertainty reasoning), and (ii) formal verification (formal transformation, analysis type, and QAs quantified and predicted). The single-system modeling and verification approaches are able to perform the transient analysis with the quantification of single QA. However, these modeling approaches are unable to deal with the unique architectural characteristics of SoS as these are intended for single stand-alone deterministic systems; therefore, they fail to provide solutions such as that as proposed in our approach to overcome the existing limitations. Among these approaches, QaSten [38] based on Automata by Wei et al., provides a modeling and verification approach with semi-formal transformation rules. Specific to single systems, Song et al. [47] have been able to constrain the stochastic behavior of large complex systems to a certain extent.

IX. SAM-SoS: COMPARISON WITH EXISTING APPROACHES
Among techniques specific to SoS architecture modeling, CML based on CSP in [49] partially supports the description of stochastic behavior using contracts specification coupled with semi-formal SysML notations. Therefore, it is unable to model SoS behaviors that later can be evaluated quantitatively. Sosadl by Flavio [21] is a promising formal modeling language based on CCS and CCP. However, it does not support stochastic models and uncertainty reasoning in its current state since the underlying formalism is unable to constrain actions with probabilities or random rates; and consequently, it fails to provide features necessary for model verification. Although some of these approaches attempt to manage structures and behaviors with probabilistic choice, none of these can deal with the non-determinism and stochasticity of SoS architecture dynamic behaviors for modeling, reasoning, and further quantitative verification. None of these approaches provides formal transformation rules from a stochastic architecture model to a statistical model checker, which should conform semantics to ensure consistency.
Compared to existing techniques, our proposed approach specifically brings modeling capabilities, with HSF deriving process algebraic features that have the essential vocabulary and reasoning semantics to deal with uncertainty and stochastic behaviors of SoS at the architectural level. It allows us to build stochastic models quantitatively, enabling various analyses of SoS architecture models. With formal transformation rules, we provide consistency and completeness for model mapping from HSF to PRISM. Our verification consists of model-checking of steady-state and transient analysis coupled with known and unknown bounds that enable quantitative prediction of multi-QAs in terms of the performance and reliability of SoS architectural models. Our approach is more comprehensive and addresses most of the shortcomings of the current approaches for the modeling and verification of SoS architectures.

X. CONCLUSION AND FUTURE WORK
In this research paper, we have proposed a comprehensive approach for the modeling and verification of complex SoS architectures. Our contribution to the broader body of knowledge is multi-fold. At the first stage, we have devised a hybrid formalism by integrating syntax and semantics of traditional PAs into SPA. To deal with unpredictable and uncertain interactions within a SoS, we have introduced stochastic model capabilities. The exogenous behavior of CSs involves parallel actions by means of concurrent stochastic constraints applied to HSF. Improved syntax and semantics give reasoning capabilities for modeling the random, ND behavior of stochastic SoS. In particular, the MDE approach was adopted whereby we transform EBNF into formal DSL, producing an extended HSF for SoS.
At the second stage, for model verification, we establish formal rules for mapping from our stochastic model into the PRISM equivalent transformation, employing SMC. Formal rules allow the automated transformation of the architectural elements of HSF into the PRISM language. Thirdly, we propose a unique verification approach using time bounded modeling checking against labeled Markovian processes. In addition to performing behavioral analysis, we allow system architects to undertake qualitative and quantitative analysis. In this paper, the approach has been validated with an endto-end case study of real-time CPSoS employed to manage and control emerging fire emergency from a socio-technical perspective. The achievement of the SoS missions and the QAs has been verified through SMC with the application of both steady-state and transient analysis of known and unknown bounds. An adequate number of experiments have been run, and results have been evaluated for the stochastic system at design time before implementation, allowing system architects to make better decisions with alternative choices.
In future work, we intend to improve our formal semantics for HSF in order to build Markov models from CTMCs to MDPs that will improve the non-deterministic reasoning capability of dynamic SoS reconfigurations. For this, we shall add operators to manage dynamic architectural changes at architectural level with certain constraints. By taking into account the complexity of SoS, the MDPs will offer alternative strategies for the assessment of emergent behaviours and QAs at runtime. We plan to increase the ability of QAs by, for example, providing a specification with each dynamic configuration at abstract level. We also plan to perform QAs trade-off analysis of various attributes in order to improve the architectural design decisions for SoS.
MUHAMMAD ALI BABAR was a Reader of software engineering with Lancaster University. He is currently a Professor with the School of Computer Science, The University of Adelaide. He is an Honorary Visiting Professor with the Software Institute, Nanjing University, China. He is also the Director of Cyber Security Adelaide (CSA), which incorporates the Cyber Security Cooperative Research Centre (CSCRC), whose estimated budget is around 140 million over seven years with 50 million provided by the Australia Government. In Software Engineering Education, he led the University's effort to redevelop the Bachelor of Engineering (software) degree that has been accredited by the Australian Computer Society and the Engineers Australia (ACS/EA). He spent almost seven years in Europe (Ireland, Denmark, and U.K.) as a Senior Researcher and an Academic. He has established the Interdisciplinary Research Centre and the Centre for Research on Engineering Software Technologies (CREST), where he leads the research and research training of more than 20 (12 Ph.D. students) members. Apart from his work, he has industrial relevance as evidenced by several research and development projects and setting up a number of collaborations in Australia and Europe with industry and government agencies. His publications have been highly-cited within the discipline of software engineering as evidenced by his H-index is 46 with 8240 citations as per Google Scholar in January 2020. He leads the theme on Platform and Architecture for Cyber Security as a Service with the Cyber Security Cooperative Research Centre. He has authored/coauthored more than 220 peer-reviewed publications through premier Software Technology journals and conferences. VOLUME 8, 2020