Two-Level System of on-Line Risk Assessment in the National Cyberspace

The paper presents a hierarchical, two-level approach to on-line cyber risk assessment at the national level. It takes into account cyber threats and vulnerabilities identified at the lower level formed by essential service operators and digital service providers. A computational algorithm is proposed, making use of the local measurements and assessments and its asynchronous convergence is proved. At the end a case study concerning a system consisting of four entities is presented.


I. INTRODUCTION
The article develops a hierarchical, two-level approach to an on-line national level risk assessment (NLRA), taking into account cyber threats and vulnerabilities identified at the lower level, which comprises key service operators and digital service providers. A key service operator or a digital service provider will be further referred to as a local entity (LE), while the unit responsible for risk assessment at the national level will be referred to as the Center (CNT).
It should be noted, that there are very few proposals of approaches to cyber risk assessment at a national level, in particular to on-line assessment [1]. According to ENISA's November 2013 analytical report [2], the NLRA could be carried out ''through a formalized central framework or approach. . . '' or ''based on a decentralized model where each actor prepares their own risk assessment to be integrated by a coordinating authority''. This document also says that NLRA approaches are either ''scenario-based, where actors are gathered together to consider scenarios in the round; such scenarios describe risks as a narrative and label them by applying simple categories of likelihood 1 and impact (low, medium, high)'' or ''quantitative approaches which apply ordinal thresholds. . . '', or ''approaches which combine elements of all of the above (for example, using scenarios and then qualitative and quantitative methods)''.
On the other hand, the Directive (EU) 2016/1148 of 6 July 2016 [3], concerning measures for a high common level 1 In this text the term ''likelihood'' is used to refer to a subjective, descriptive or numerical representation of a belief regarding the possibility of an event.
The associate editor coordinating the review of this manuscript and approving it for publication was Parul Garg.
of security of network and information systems across the European Union, requires the national Computer Security Incident Response Teams (CSIRTs) to provide, ''dynamic risk and incident analysis and situational awareness''.
Accordingly, two approaches can be taken to implement the NLRA. The first would be to build an aggregated model of a national cyberspace covering all relevant actors and taking into account their interdependencies. The second approach would be to propose a decentralized, hierarchical, two-level on-line framework for the NLRA, where LEs, on the basis of their local measurements, will repetitively produce their own assessments for use by the CNT to coordinate them and to estimate the overall risk ( Fig. 1). This article is about the latter possibility.
It is also important that an on-line risk assessment should be predictive, taking into account, hopefully in a simplified way, temporal dependencies of LEs, in view of their local risk assessments on cyber threats and services provided by other LEs.
Other on-line mechanisms, based on different assumptions, were presented in [4], [5] and [6]- [11]. Some of the preliminary results of the author's work were presented in the conference communiqué [12]. This paper concentrates on the kernel of the proposed method: the computational algorithm and its convergence. The name of the approach has been changed for a better description of its essence. that the full NLRA may be performed at the beginning of each time period [t c , t c+1 ] in such a fast mode, that the time required for the analysis is small when compared to the duration of the inter-analysis interval.
Assume that the current risk analysis of the service s at a given time t c is concerned with future time interval T s composed of a number of subintervals T s p , where p = 1, . . . , P s ; i.e., For each of these subintervals let the likelihood of failure of a service s be denoted as D s (p). In the simplest case this can be a real number, e.g., D s (p) ∈ [0, 1]. The possible failure scenario (PFS) of the service s is then defined as D s = (D s (p); p = 1, . . . , P s ). The current time, at which the iterative process to be described is to be performed, is associated formally with the beginning of the subinterval T s 1 . We assume that we allow for any appropriate risk assessment method to be used at a given local entity (LE) level, which is able to take into account: • the current situation concerning its internal cyber security, • PFSs of those LEs that are relevant to proper functioning of the considered LE, and to produce its own PFS.
Intervals T s p can be of different length, related to periods of time relevant to various services. In particular, suppose that P s = 4 and T s 1 refers to short immediate future period during which some services may get affected by currently existing threats, including the observed cyber incidents. Other entities may be more concerned with the subsequent, longer, periods T s 2 , T s 3 (mid-term), and T s 4 (long-term) (Fig. 2). PFSs of key services will represent crucial information at the CNT level and may be used, in particular, for graphical threat presentation and for the risk assessment (analysis) performed at this level, especially in case when the Center can assign numerical loss (cost) values to PFSs.

III. COORDINATION
Assume now that we initiate at a given time the analysis that will provide us with an overall risk assessment, under current conditions, over future time intervals T s as defined above for all s ∈ S, where S is the set of all considered entities (services).
At the CNT level we may propose to adopt the iterative approach, following the interaction prediction method [13], [14]. One may begin with a set of initial PFSs D s,(0) for s ∈ S. This allows to initiate the iterations, i.e., to start the coordination process. The initial scenarios can be defined, e.g., as the result of calculations at time t c−1 or current local static risk analysis.
At iteration k = 1, 2, 3, . . . the set of PFSs given by D s,(k) for s ∈ S is modified as follows. Let us take that for each entity s the set of those entities on which this entity is dependent is denoted as U s . The scenarios D u,(k) for u ∈ U s are used together with all currently available information at LE level (likelihoods of cyber threats, local vulnerabilities, observed incidents, etc.) to perform local risk analysis and to estimate a new value D s,(k),new of the (output) scenario of the entity s. After this is done for all entities, the suite of new predicted scenarios for iteration k + 1 may be computed at the CNT level. For this purpose many algorithms can be used. The simplest of them is the direct re-injection whereby It usually would be better to use relaxation, smoothing, algorithm for computing D s,(k+1) as where 0 < ρ ≤ 1 is the relaxation coefficient; if ρ = 1 then (3) reduces to (2). In both algorithms the substitutions are made component-wise.
The iterations defined by the algorithm are performed until a satisfactory convergence is achieved. It may happen that during the iterations, each of which may take some time, the information available at the LE level changes due to, for example, new incidents being observed and/or new vulnerabilities identified. This may affect the iterative process. Furthermore, the T s p range can be modified if needed. It can therefore be assumed, that the iterative process can be viewed as a continuous activity. Then the properties of this process should be examined in terms of the relevant factors, in particular the LE level analysis procedures and the dynamics of the local assessment problems.

IV. ANALYSIS AT LE LEVEL
To illustrate the procedure, let us now consider the risk assessment at the local entity level. Assume that the s-th LE information system suffers from a number of vulnerabilities that can be exploited by a number of cyber threats. The set of these vulnerabilities is denoted by V s ; v ∈ V s when vulnerability v is present in the information system of the considered entity. When this vulnerability is exploited there is an impact I s v on the likelihood of service provided by LE to be degraded or disrupted (service failure). These impacts may be, in particular, expressed as [4]: Very Low (0-0.04), Low (0.05-0.20), Moderate (0.21-0.79), High (0.80-0.95), Very High (0.96-1). The likelihood of vulnerability v that can be exploited may be defined as related to possible cyber threats, where, say, threat j may affect the s-th LE when j ∈ J s . With each threat it should be possible to associate the level of likelihood that this threat may exploit vulnerability v ∈ V s , namely L s vj . In addition to these internal cyber threats it may happen, that services external to the s-th LE, on which this entity is dependent, can be substantially degraded or disrupted for certain time periods. The set of those entities is denoted as U s , while I s u represents an impact of the failure of service u on the service s. The likelihood of service s to fail within the subinterval T s p can be then defined as where p = 1, . . . , P s and σ s u = 1, . . . , P s represent delays associated with time periods after which a failure of service u may affect in a substantial degree the s-th LE regarding its own capability to provide required service. Those delays can be either assumed to be different for various components of the failure function or the same for all components. In particular, they may be set as equal to zero whenever appropriate. In the case when p < σ s u the value of D u (p − σ s u ) refers to the past and is set to zero unless the service u is already compromised or interrupted at time of the ongoing risk analysis. In such a case, the above formulae allows to take into account the already observed level of service degradation, represented, for example, as D u (−1) = 1.

Risk activation function may be defined as
1 when threat j is expected to be present within T s p 0 otherwise (5) It is assumed that L s (p) is dependent on cyber threats associated with subinterval T s p , while this likelihood may depend upon the failure likelihood of other services related to earlier subintervals. It is possible, of course, to introduce similar dependence of L s (p) on earlier threat occurrences.
In the simplest case, the output failure D s (p) of service s may be set as equal to L s (p), assuming that L s (p) ∈ [0, 1]. D s (p) = min 1, L s (p) for p = 1, . . . , P s (6) It can be observed immediately, that we need to know the failure scenarios of services affecting the s-th LE to compute L s (p) (4) for p = 1, . . . , P s and D s (p) may be computed only after L s (p) is known (6). So, for computing L s (p) from (4) at iteration k of the CNT level coordination step one should use D s,(k) , while D s (p) computed then from (6)

V. CONVERGENCE OF THE ALGORITHM
Now we will analyze the conditions under which the algorithm (3), (6), (4) is convergent.
Since the first component of the sum in (4) is a constant, we may represent this algorithm in the following general form: where x ∈ R n is the vector of all variables D s (p), p = 1, . . . , P s , s ∈ S and for i = 1, . . . , n So, the general form of the algorithm (3),(6), (4) is as follows: The mapping F(x) is nonsmooth, hence we cannot use directly the precise formula concerning nonlinear mappings from [15], based on Jacobian matrix. Instead, we will derive a sufficient condition of convergence making use of the theory of convergence for asynchronous iterative algorithms [15]- [17]. One of the most general theorems says, that a sufficient condition for the algorithm (7) to be convergent when implemented in a totally asynchronous way is the contractive character of the mapping F : R n → R n in the maximum norm, that is [15]: Theorem 1: We consider a nonlinear mapping F : R n → R n where the coordinate functions are defined as follows: for ρ ∈ (0, 1]. If the coefficients a ij are nonnegative and satisfy the conditions: j =i a ij < 1, i = 1, . . . , n (12) then the mapping F is a contractive mapping in the maximum norm. Proof: Let us consider two vectors x, y ∈ R n and denote by i * the index of the coordinate which determines the maximum norm of x − y, that is Taking into account definition (11) of functions F i , when all coefficients a ij are nonnegative and 0 < ρ ≤ 1, we have for the mapping F: Let us concentrate now on the term: There are four combinations to analyze: We have here: We have here: We have here: We have here: Summing up all these cases, we may write: Taking this, (13) and the assumption (12) into account in the assessment (14), it means that: what means, that F is a contraction mapping in the maximum norm.

VI. EXAMPLE OF A FOUR-ENTITY SYSTEM
To better illustrate the ideas introduced above and coordination strategies let us introduce a four-entity system consisting of the power company (E), the transport company (T), the hospital (H) and the data center (D). Assume that in case of each entity s we consider the possible failure scenario components concerned with service availability D s (p) for every possible value of p, as defined in (6). The graph of services and dependencies between them is presented if Fig. 3. In all cases of the example entities considered it is assumed that formulae given by (4) are used together with (6) to compute the possible service failure scenarios, while the first term in (4), related to internally assessed threats, is represented by a given number. Now let us start with the electricity company (E) and assume the following timing and formulae defining the relevant scenarios:   It can be seen that both the number of time periods and their duration vary between different scenarios, while it is assumed that the overall time horizon is equal to 48 hours.
The objective now is to demonstrate the coordination process at the central level, while using the coordination strategy (3) with various relaxation coefficients.
The results of computations when the direct re-injection strategy (2) (ρ = 1 in algorithm (3)) was used are presented in   It can be observed that the rise of the likelihood of failure in the delivery of electricity (e.g., caused by weather conditions) results in immediate jumps of the likelihoods of failure of the railway system (that is transport company) and a little later we can see the same effect for the data center and the hospital. Since the disruption of power supply prolongs over next day, all cyber threats remain on higher levels for all services.
The results of computations of the D T scenario (for other scenarios we observed the same effects) when the relaxation strategy was used are presented in Fig. 8. We used ρ = 0.5 in the algorithm (3). It can be seen that the result is the same as in the case of the re-injection strategy, but the changes between iterations are smoother. Unfortunately, in this case for the same termination condition ε = 10 −6 the calculations took more time: to obtain the convergence 33 iterations were needed.

VII. CONCLUSION
We proposed a hierarchical, two-level on-line scheme for the national-level risk assessments, where local entities repetitively prepare their own assessments used by the Center (national CSIRT) to coordinate those assessments and to evaluate the overall risks. Our on-line risk assessment algorithm is predictive, taking into account temporal dependencies of local entities on cyber threats and services provided by other local entities. The iterative scheme which calculates the local forecast expresses interdependencies between different services as a linear combination of local and external components. Due to the truncation function restricting the value of the external components, the resulting mapping is nonlinear and nonsmooth. Fortunately, in quite a natural case, when the sum of the external weights does not exceed 1, this mapping is contractive and the whole algorithm is convergent. That was proved in the paper and confirmed in a numerical case study concerning a system consisting of four entities.