A Vulnerability Risk Assessment Method Based on Heterogeneous Information Network

Due to the increasing number of network security vulnerabilities, vulnerability risk assessment must be performed to prioritize the repair of high-risk vulnerabilities. Traditional vulnerability risk assessment is based primarily on the Common Vulnerability Scoring Systems (CVSS) and attack graphs. Nevertheless, the CVSS metrics ignore the impact of the vulnerability on the specific network, which accounts that the identical vulnerability exists in different network environments is assigned repeated values. Additionally, the attack graphs still suffer from scalability and readability issues. To solve the above problems, a ranking method based on the heterogeneous information network is innovatively proposed to assess the vulnerability risk in a specific network. It considers the exploitability of a vulnerability, the impact of a vulnerability on the network components, and the importance of the vulnerable components. First, a heterogeneous information network containing vulnerability and host and the relationships between host and host is constructed to compute the risk score for each vulnerability and implement the ranking process. Second, a model extension method is proposed to adapt to situations in which additional factors related to vulnerability risk assessment need to be considered. Finally, we explore two case studies to compare the proposed method with CVSS and attack graph-based methods. The simulation results show that the proposed method can accurately assess the risk of vulnerabilities in a specific network environment and that it has a lower computational complexity than other methods.


I. INTRODUCTION
With the rapid development of computer networks, the scale of networks is increasing, and a variety of network attacks and vulnerabilities have become increasingly common. The Community Emergency Response Team (CERT) found that the number of global network security incidents increased sharply from 2003 to 2019 [1]. It is difficult for network administrators to ensure that every vulnerability is fixed for each host. Notably, the process of remediating vulnerabilities can result in a loss of service quality, decreased performance, and it involves a high level of human effort. Therefore, vulnerability risk assessment is performed to select the vulnerability priorities with the highest corresponding risk for repair, which is conducive to effective network security reinforcement [2].
The associate editor coordinating the review of this manuscript and approving it for publication was Luis Javier Garcia Villalba .
The traditional approach of assessing vulnerability risk is mainly through CVSS metrics [3], attack graphs, etc. CVSS metrics can provide quantitative risk scores for vulnerabilities and methods for eliminating vulnerabilities with the highest risk. However, there are several deficiencies in CVSS metrics. As a quantitative scoring system, objectivity and dispersion should be considered. Objectivity reflects how well the results of an assessment reproduce the nature of practical scenarios [4], while dispersion considers the degree of difference and distribution of the results. For example, the Access Vector is a submetric of CVSS that has three possible values: Local_(L), Adjacent Network_(A), and Network_(N). One survey reported that for all known vulnerabilities, the Net-work_(N) value accounts for 85.69% of the three possible values [4], which can lead to a situation in which several vulnerabilities are assigned the same risk CVSS score in a network. However, that is an unreasonable result, as that will not able to distinguish which vulnerability possess a higher VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ risk score. Meanwhile, CVSS metrics do not consider the specific network environment. Therefore, the CVSS scores does not objectively reflect the vulnerability risk in diverse network environments. The attack graph-based method represents attack scenarios by showing possible attack paths from the attackers to the target. Many researchers have assessed network risks and modeled network threats based on attack graphs [5], [6]. The analysis of an attack graph will facilitate identifying critical exploitations of vulnerabilities, assets, and vulnerable configurations. This will help administrators of the network strengthen network security. However, there are still several problems with the attack graph-based approach. Attack graph generation can involve up to polynomial complexity, and the evaluation and analysis of attack graphs to determine all possible attack paths suffer from scalability issues [7]. Meanwhile, a large-scale attack graph is complicated, making it difficult for humans to digest all the dependency relations and specify the key problems in a limited amount of time. Moreover, the attack graph model needs improvement because it currently only considers the relationships among vulnerabilities; it cannot model high-level attack Tactics, Techniques, and Procedures (TTPs) [8].
Given the problems mentioned above, this paper proposes a ranking method based on the Heterogeneous Information Network (HIN) [9] for vulnerability risk assessment. First, the proposed approach considers both the exploitability and the impact of vulnerabilities in a specific network; this approach avoids assigning the same risk score for the same vulnerability in different networks. Second, utilizing a heterogeneous information network-based method can model multiple types of objects and the relationships among them as well as different types of semantic information [9]. Additionally, such models possess sufficient representation ability and extensibility when more factors need to be considered due to the changes in the network environment. Finally, The proposed approach utilizes the graph-based ranking method, which promises an acceptable computational complexity.
The main contributions of this paper are summarized as follows: 1) A comprehensive semantic represent model. The vulnerability risk assessment is conducted from a new perspective based on the heterogeneous information network that can fuse more information and introduce higher-level semantics. The proposed approach may help to model the higher-level threat such as the Tactics, Techniques, and Procedures. Furthermore, our work will facilitate the ontology-based quantitative risk assessment. 2) HIN-based ranking method for vulnerability risk assessment. This paper proposes a vulnerability risk assessment method based on the heterogeneous information network for a specific network environment; First, a heterogeneous information network containing vulnerabilities, hosts and the relationships between hosts is constructed. Then, a ranking algorithm for vulnerability risk ranking is designed that considers the exploitability of a vulnerability, the impact of a vulnerability on the network components and the importance of vulnerable components. Finally, to accommodate changes in factors that need to be considered, the solution of extending the model and the calculation method of vulnerability risk scores are proposed. 3) Practicality comparison. To demonstrate the advantages and disadvantages of the proposed approach, two practical case studies (include three comparisons) are conducted and the results of the proposed model are compared with the CVSS metric-based method and attack graph-based methods. We constructed two small enterprise network environments to test the methods.The source code and input file for the attack graph generation tool are available online. The results show that the proposed method can produce sufficiently precise results, with an acceptable level of computational complexity.
The remainder of this paper is organized as follows: In Section 2, the related work is introduced. In Section 3, first, we introduce the system model, which contains a brief review of the heterogeneous information network, CVSS metrics. Second, the model we proposed, the computing method, and the solution of model extension are introduced. The two case studies (including three comparisons) are described in Section 4 for the constructed network environment, and the results are presented. Finally, Section 5 concludes the paper.

A. THE CVSS-BASED APPROACHES
Many studies have been conducted to improve the usability of CVSS by optimizing objectivity and dispersion. To improve the objectivity of CVSS, [10], [11] performed the CVSS research and proposed several attributes as novel metrics. The heterogeneity of diversity and vulnerability distributions was considered in [12]. The authors of [13] used 3000 vulnerabilities from the National Vulnerability Database (NVD) [14] to validate the objectivity of CVSS. The approach in [15] proposed combining a scoring system with the CVSS to measure the severity cost of hosts. In [16], a temporal feature was added to CVSS. The authors of [17] conducted research on the dispersion of the CVSS. In [4], a CVSSbased vulnerability scoring system was proposed to improve dispersion.
However, the above studies ignored the influences of different environments on vulnerability risk [18]. For example, the same vulnerability may have different risk levels for different devices or environments; thus, the risk level should be determined based on the importance of devices to the network, the relevant security requirements and other associated factors. Notably, [19] researched vulnerability risk in the OSs of tablets and smartphones. The risk assessment formula for the CVSS was optimized to adapt to vulnerability risk assessments for IoT systems [20]. A vulnerability risk assessment method based on the CVSS was proposed to assess the vulnerability risk of a cloud service [21]. The above studies targeted vulnerability risk assessments in specific environments. Nevertheless, some deficiencies remain in the above studies. When the factors that should be considered change, the above models cannot be extended to adapt to the new environment.

B. THE ATTACK GRAPH-BASED METHODS
Attack graph-based methods are widely used in network security risk analysis, threat mitigation, decision making, etc. A review of attack graphs in cybersecurity is given in [22]. To perform network security risk analysis, reference [23] combined CVSS metrics with an attack graph to provide precise assessments of the risk of a vulnerability. The authors of [24], [25] combined a Bayesian network with the CVSS to quantify the possibility of network compromise and strengthen the network. In [26], the attack costs and benefits were quantified and integrated with different metrics to evaluate countermeasures for security issues based on an attack graph. The authors of [2] integrated the idea of dynamic defense into attack graph analysis and proposed a probability-based approach to perform a quantitative network risk assessment. In [27], [28], PageRank [29] was utilized to evaluate the importance of nodes or states in an attack graph, which will improve the readability of the attack graph. In [5], the maximum reachable possibility of nodes based on the graph-based inference algorithm [30] were evaluated. The above studies conducted detailed analyses based on attack graphs. Nevertheless, the generation and analysis of attack graphs still suffer from scalability issues. Meanwhile, a lack of standards, prescriptive methodologies and common approaches in terms of visual syntax lead to another important issue, which is the lack of the sufficient readability [22].

III. VULNERABILITY RISK ASSESSMENT MODEL
In this section, we first briefly review the heterogeneous information network and construct the Device-Vulnerability bi-type graph. Second, we propose weighted ranking rules based on the heterogeneous information network for vulnerability risk assessment and obtain the corresponding vulnerability risk score and ranking. Finally, we consider the requirement of model extension and propose a solution.

A. PRELIMINARIES
A heterogeneous information network contains multiple types of objects, the relationships between objects and different semantic information.
Definition 1 (Information Network [31]): An information network is a directed graph G = (V , E) with an object-type mapping function τ : V → A and a link-type mapping function φ : E → R, where each object v ∈ V is associated with a particular object type φ(v) ∈ A and each link e ∈ E is associated with a specific relation φ(e) ∈ R. If the number of object types |A| > 1 or the number of relationship types |R| > 1, then the information network is a heterogeneous information network; otherwise, the network is a homogeneous information network.
For a given complex heterogeneous information network, a meta-description must be provided to fully understand the object types and link types in the network. Therefore, a network pattern is defined that describes the structure of the network.
Definition 2 (Network Schema [31]:) A network schema is denoted as T G = (A, R), which is a meta-template for an information network. A schema includes object type mapping φ(v) ∈ A and link mapping φ(e) ∈ R. A directed graph is defined based on object type A and link type R. Relation R maps from type A to type B, denoted as A R → B. A and B are the source type and target type, denoted as R.S and R.T , Example 1: Figure 1 is an information network that contains two type of objects: the Device (PC1, PC2, PC3) and the Vulnerability (CVE-2019-5482, CVE-2019-6645, CVE-2019-1580). There are multiple relationship types between different types of objects. For example, the relationship between Device and Vulnerability is that a vulnerability exists in the device. The relationship between a device and another device is that one device accesses the other device. A device can be regarded as a PC, Server, etc. The numbers of object types and connection types in the figure are greater than one, so this is a typical heterogeneous information network. For the heterogeneous information network in Figure 1

Definition 3 (CVSS [3]): The Common Vulnerability
Scoring System (CVSS) [3] consists of three measurement VOLUME 8, 2020 groups: Base, Temporal, and Environmental (v3.0). The details are shown in Figure 3. The Base group contains three main metrics: Exploitability, Scope, and Impact metrics. The Base group metrics reflect the severity of a vulnerability based on its intrinsic characteristics, which remain constant over time and assume a reasonable worst-case impact across different deployment environments. This paper primarily uses the Exploitability and Impact metrics of the Base group.
Exploitability Score (ES): The exploitability index reflects the characteristics of vulnerable entities which are called vulnerable components. Each of the Exploitability metrics is scored relative to a vulnerable component and reflects the vulnerability attributes that lead to a successful attack. The availability metrics mainly include Attack Vector (AV), Attack Complexity (AC), Privileges Required (PR), and User Interaction (UI).
Impact Score (IS): The impact metrics are scored according to the component that suffers the worst outcome that is most directly and predictably associated with a successful attack. Impact metrics include Confidentiality Impact (C), Integrity Impact (I), and Availability Impact (A).

B. VULNERABILITY RISK ASSESSMENT BASED ON THE HETEROGENEOUS INFORMATION NETWORK
In a specific network environment, the risk score of a vulnerability is related not only to its own attributes, such as exploitability and component impact, but also to the network environment in which the vulnerability is located. Vulnerability risk levels should vary among different network environments. Therefore, we recalculated the IS and combined it with the ES to assess the risk of a vulnerability.
The PageRank [29] gives the statement: ''a page has a high rank if the sum of the ranks of the backlinks is high''. Inspired by this statement, we considered this situation to be similar to that of the vulnerability on a host, i.e. The host accrues a higher risk score when it is accessible by many high-risk hosts. This is reasonable because when intruders who successfully penetrate one host subsequently use it as a springboard; thus, all hosts that can be accessed by the compromised host in some way are themselves more likely to be compromised. Therefore, the risk score of this host should be higher. The situation mentioned above involves only the host and the accessible relationship, i.e. one type of object (host) and relationship (access/accessed).
In a heterogeneous information network, although multiple types of objects and relationships exist, a similar penetration scenario can occur. PopRank [32] proposed a framework to rank multiple objects in a heterogeneous information network based on the idea that the popularity of different types of objects affect other objects. Inspired by PopRank, we believe that the risk score should be influenced by multiple types of objects; therefore, the popularity in PopRank is regarded as the risk score in this paper. For example, in a network, we believe that a host should be considered high-risk if it exposes multiple high-risk vulnerabilities. Similarly, when a vulnerability affects multiple high-risk hosts, that vulnerability should be assigned a high risk value. We can imagine that a portion of the vertex's value (risk score) ''flows'' to its out-neighbors (whether that node represent a host or a vulnerability); at the meantime, the vertex assembles the value from its in-neighbors. Therefore, calculating the rank for the Device (D) and Vulnerability (V) respectively will reflect the risk of the host and vulnerability in a network in some extent.
As a consequence, given the heterogeneous graph shown in Figure 2, we can use the above heuristic relations to rank vulnerabilities and perform risk assessment. According to the above analysis, the risk score of each vulnerability is obtained through Formulas 1 and 2. The specific notations and their descriptions are listed in Table 1.
The parameter α is used to control the weight of risk for different types of nodes, and we set it to 0.5. When a vulnerability has a high ES, it will be easy to exploit. Therefore, to assess the risk precisely, we construct the adjacency matrix W DV by setting the W DV (i, j) as ES(j) if device i has vulnerability j. Similarly, the PN (i, j) is used as the weight between device i and j.
To make the formula work, we should initialize the vector r respectively. Keeping the process of iteration, two sequences will be produced: D · · · · · ·}.
According to [33], C 1 and C 2 will converge to the primary eigenvector of (αW VD (I − (1 − α)W DD ) −1 W DV ) and αW DV W VD +(1−α)W DD respectively. The iterative method is a power approach [34] used to compute eigenvectors. Therefore, we will calculate the risk score with an iterative method. Before we calculate the risk score using equations 1 and 2, we should be normalized by the column: Furthermore, the more exploitable a vulnerability is, the greater the risk value of that vulnerability that will be passed to the host. Therefore, to conduct a reasonable risk assessment, hosts that have incoming edges only and no outgoing edges, we distribute their risk values evenly among the other hosts. Therefore, we set a constant value a p r (0) p to disperse the risk value from the host which has no out neighbours to other hosts, as shown in the extension Formula 4.
In this paper, we calculate the results to six decimal places. Therefore, when the difference between two iterations is less than 10 −7 , we consider the result convergent.

C. MODEL EXTENSION
In Section 2, to perform quantitative vulnerability risk assessment, we consider not only the risk from a vulnerability itself but also the risk score propagated from others devices in the network. However, the factors that need to be considered may change as the study progresses or the network environment changes. When an assessment needs to include additional factors, we can identify the relationships between the new factors and the existing nodes in the network schema. Then, the weights and an appropriate adjacency matrix should be constructed.
For example, when a vulnerability risk assessment needs to additionally consider the life cycle of vulnerabilities, as shown in Figure 4, we can add a node that represents vulnerability life cycle and build a reasonable adjacency matrix W VL . When a vulnerability v i is in the life cycle of L j , W VL = weight, where weight can be defined flexibly; however, it should be ensured that the larger the weight is, the higher the risk of that vulnerability is. When a risk assessment based on multiple types should be conducted, the following formula can be used.
where r (k+1) p (i) represents the quantitative score of instance i of type p after the k + 1-th iteration. Here, m is the total number of types in the heterogeneous information network, and n q is the number of instances of type q. The parameter λ pq is the weight of the type p and q, and α p is the weight of the initial value of type p. Before calculating the risk score, we normalize all the adjacency matrixes by column, as in Section 3.2 (Formula 3). The equation will then converge according to reference [35].

IV. EXPERIMENT
We conduct two case studies to compare our method with the CVSS metrics and the attack graph-based methods. The source files involved in the experiment part are available online. 1

A. COMPARISON WITH CVSS METRICS
In this section, we compare our proposed method with CVSS metrics. We construct a typical enterprise LAN, which was used in [36], as shown in Figure 5. The network includes two internal LANs (one for finance and one for technicians), a wireless LAN open to visitors (if the network is penetrated, an intruder can enter the internal network) and a DMZ hosting the servers, including a DNS server to provide DNS services, a web server to provide web services, a mail server to provide mail services, an FTP server to store and transfer files and a database server to store data. There are 19 hosts in the network. The financial department has seven hosts (No. 1-7); six of them are user PCs (No. 1-6), and one is an administrator PC (No. 7). The technical department has seven hosts (No. 8-14), among which six are user PCs and one is an administrator PC. The DNSServer, WebServer, MailServer, FTPServer, and DatabaseServer are numbered 15-19, respectively.
In this paper, the device include firewalls, switches and routers will not counting as one of the hosts. Because, the functions of the above devices are to control the accessible relationships, and the modifications of these devices are mainly involve the accessible relationships between other devices. Therefore, we simplify the property of these devices as the accessible relationships in the adjacent matrix W DD . The more detailed consideration will be performed in the future.
For the acquisition of the knowledge of experimental network environment, the network scanning tools, such as Nessus [37] or OpenVAS [38] can be used. The network metadata, such as a list of hosts, services, ports, vulnerabilities, etc., can be extracted by OpenVAS and Nessus. The detailed description of known vulnerabilities can be obtained from standard data sources, such as the National Vulnerability Database (NVD) [14]. A detailed description of the experimental data can be found in Appendix.
To make the formula work, we should initialize the vector r   Table 2 shows the vulnerability risk assessment results of the approach proposed in this paper and the CVSS metrics. Notably, ''Risk Score'' is the quantitative score of the vulnerability calculated by the method proposed in this paper, while ''Impact Score (IS)'' is a metric of the CVSS Base Score (BS). Here, ''Combination'' is the risk assessment score, which combined the Risk Score with ES and ''Base Score (BS)'' is the CVSS Base Score of the vulnerability (for comparison, the scores are normalized).
Additionally, because IS does not consider the vulnerable extent or the importance of components in a specific network environment, the IS cannot be used to accurately assess vulnerability risk in a specific network environment. For example, in Table 2, the IS values of vulnerabilities 35 and 31 are equal and rank relatively low. However, in the results of the algorithm proposed in this paper, vulnerability 35 has the highest risk score. From the perspective of CVSS, the IS of vulnerability 35 is 3.6, and its ES is 3.9. Therefore, if only the IS of a vulnerability is considered, the risk of a vulnerability may not be appropriately defined. However, in this network, vulnerability 35 exists in hosts 7, 13, 14, 15, and 19 ( Table 6 in the Appendix), among which host 7 and host 14 are administrator PCs, which can access all the hosts and servers through various ports. Similarly, hosts 15 and 19 are servers in the LAN. All the hosts in the network can access these servers through the relevant ports; thus they are relatively important devices. Although the IS of vulnerability 35 is relatively low, the vulnerability should still be considered high risk because it exists on several important devices. Therefore, the method proposed in this paper can more reasonably assesses the risk of a vulnerability. The ''Combination'' result considers both the impact of a vulnerability on a specific network and the exploitability of that vulnerability. Compared with the CVSS metrics approach, the vulnerability risk assessment method proposed in this paper provides a more reasonable and accurate vulnerability risk value and can clearly distinguish among vulnerability risk values to provide a high-quality vulnerability repair strategy. Figure 6 shows a ranking comparison between risk score and IS. Different risk scores can be obtained by different methods for the same vulnerability. The longer the green line is, the greater the difference between the two methods. There is a clear difference among vulnerabilities 1, 35, 12, and 37. The IS of vulnerability 1 is 1.4, and its ES is 3.9. Therefore, from the perspective of CVSS, vulnerability 1 is not a high-risk vulnerability. However, in the specific network environment, vulnerability 1 exists in hosts 7, 13, 14, 15, 16, and 19 ( Table 6 in the Appendix), among which host 7 and host 14 are administrator PCs that can access all hosts and servers through relevant ports; additionally, hosts 15, 16, and 19 are servers in the LAN. All the hosts in the network can access these servers through the relevant ports; thus, the servers are relatively important devices. Therefore, the risk score of vulnerability 1 needs to be adjusted. Vulnerabilities 35,12, and 37 are in the same situation as vulnerability 1. Figure 9 in the Appendix shows how the risk score of each vulnerability changes over an increasing number of iterations. As shown, the risk score of each vulnerability eventually converges. For most of these vulnerabilities, the number of iterations required to achieve convergence is approximately 10 to 20. For vulnerabilities 1, 12, 25, 28, 33, and 35, the final risk score are higher than the initial values; for others, the final scores are lower than the initial values. Thus, when a specific network environment is considered, a different risk score should be obtained to more precisely assess the vulnerability risk.

B. COMPARISON WITH ATTACK GRAPH-BASED METHODS
In this section, we conduct two case studies to specify the advantages and disadvantages of the proposed method and two attack graph-based methods.

1) COMPARISON WITH BAG-BASED RISK ASSESSMENT
The Bayesian Attack Graph (BAG)-based risk assessment method was proposed in [2], and it conducts an in-depth exploration to determine static and dynamic risk assessments and perform risk mitigation analysis. To perform the risk assessment, the BAG is constricted for this experimental network, and the prior probabilities are initialized. Subsequently, the unconditional or conditional probability for each node is calculated based on Bayesian inference to assess the probability of risk occurrence. To perform the risk mitigation analysis, the cost and benefit of 13 security controls for the test network are quantified and those that maximizes the benefit and minimizes the cost are selected. The static risk assessment and risk mitigation in [2] are closest to those in our work; therefore, we will mainly consider them.
The probability of each vulnerability in [2] for static risk assessment is shown in Table 3, where ''Vulnerability'' and ''Probability'' denote the 13 vulnerabilities and their corresponding exploitation probability (from [2]). The ''Vulnerability Ranking'' shows the ranking results of the proposed method. We initialize W VD , W DD , r (0) V and r (0) D and use Formulas 1 and 2 to acquire the ranking. The parameter α is set to 0.5. As shown, the ''SQL Injection'' vulnerability has the largest exploitation probability in the experimental network. However, in the ranking list of the proposed method, the ''SQL Injection'' is ranked fourth, while ''MS Video ActiveX Stack BOF'' vulnerability is ranked first.
The result is not unexpected. The outcome of the static risk assessment in [2] is the probability of exploitation, which only considers the relationships between vulnerabilities and their exploitability but dose not consider the impact of those vulnerabilities. Therefore, in the risk mitigation part, the BAG-based method considers the cost and benefit of performing more reasonable decision making support. For ''SQL Injection'', as shown in Table 4, the corresponding security control is ''query restriction'', which has a relatively high cost but provides only low benefits. Therefore, the ''SQL Injection'' will not usually be the first vulnerability to be patched. Table 4 shows the results of the proposed method and the risk mitigation part in [2]. The ''Cost (A)'', ''Outcome (B)'', and ''Net Benefit (B − A − 622.0)'' denote the cost, outcome, and benefit of using each security control individually. The ''Vulnerability Ranking'' shows the ranking results of the proposed method. Because the cost of exploiting a vulnerability is not considered in our work, we compare the results based on the ''Outcome'' shown in Table  4. The vulnerability ''MS Video ActiveX Stack BOF'' has the highest ranking, and its corresponding remediation action is ''apply an MS workaround''; this control is ranked third in [2]. The corresponding remediations for the vulnerabilities ''LICQ Buffer Overflow (BOF)'' and ''Remote Login'' are ''filtering external traffic'', which ranked second. The third vulnerability in the ranking list ''SQL Injection'' can also be fixed by ''filtering external traffic''. The ''IIS vulnerability in WebDAV service'' can be mitigated by the ''filtering external traffic'' and ''disable WebDAV'', which are ranked second and fourth, respectively. The results obtained in this paper are basically consistent with those in [2], which means that when using the order of the vulnerability ranking to perform  risk mitigation, the outcome will be maximized. Therefore, the proposed method can produce a high-quality handbook for vulnerability risk mitigation. Although the vulnerability ranking does not strictly correspond to the security control ranking, that does not influence the effectiveness of repairing the vulnerability. Generally, a system update process will fix several prioritized vulnerabilities. Therefore, as long as several of the vulnerabilities with the highest risk scores are fixed, network security will be effectively strengthened.

2) COMPARISON WITH ASSETRANK FOR RISK ASSESSMENT
AssetRank was proposed in [28] and ranks the nodes in an attack graph using a PageRank-based method. To perform a comparison with AssetRank, we constructed a middle-sized network, which containing 13 hosts and 7 vulnerabilities. Figure 7 depicts the experimental network. We used the attack graph generation tool MulVAL [39] to generate the attack graph and we reproduced the AssetRank method and constructed the proposed method to target this network. Table 5 shows the results of AssetRank and the proposed method. The columns titled ''Vertex'' and the ''Rank×10 2 '' denote the nodes in the attack graph and the results of Asse-tRank, respectively, while the columns titled ''Vulnerability Ranking'' and ''Score'' indicate the ranking and score of the proposed method, respectively.
As Table 5 shows, AssetRank produces several repeated values that represent less precise results. For example, for the vulnerabilities ''CVE-2010-0483'', ''CVE-2002-0392'' and ''CVE-2010-0812'',the value should be different. Figure 8 shows the partial attack graph generated by Mul-VAL(for clarity, we simplifies the attack graph). The number before the colon in each node represents the node number in the full attack graph generated(available online). The diamonds denote the ''OR'' vertices, ellipses indicate the ''AND'' vertices, and the boxes denote the sink  vertices. As shown in the attack graph, these three vulnerabilities account for the ''execCode'' on ''DataHistorian_2'', ''MailServer_1'' and ''Workstation_1'', and patching the three vulnerabilities will eliminate the three ''execCode'' threats. However, these three machines are distinctively different. In this network, ''workstation_1'' can not access any other devices; therefore, it cannot spread the risk by accessing other devices. The ''MailServer'' can be accessed by most devices in the test environment and can also access many other devices; therefore, the vulnerability rating for ''CVE-2002-0392'' should be more important than that of ''CVE-2010-0812''. The ''CVE-2010-0483'' is in the same situation. Therefore, the proposed method assesses vulnerability risk more precisely than does AssetRank and provides high-quality remediation suggestions.

3) COMPUTATIONAL COMPLEXITY ANALYSIS
The proposed method processes each link in the heterogeneous graph and performs iterative computations. Let C denote the number of nodes of different types in the heterogeneous information network, K denote the total number of links in the heterogeneous graph, and V denote the total number of objects. Let t denote the number of iterations. Then, the complexity of the calculation task requires O(t * C * K ). Additionally, K <= V 2 ; therefore, the maximal complexity for performing the calculations is O(t * C * V 2 ). The approach in [2], has a complexity of O(N 3 ) to generate the attack graph, where N = A * M , A is the number of attributes in the attack graph, and M is the number of machines in the system. The probability-based analysis of the attack graph has a complexity of O(2 n ), where n is the number of variables. For the method in [28], first, the time to generate the attack graph will be consumed; then, the time consumption of the ranking part is equal to that of the proposed method. In summary, the proposed approach provides sufficient precision at an acceptable computational complexity.

V. CONCLUSION
In this paper, we innovatively propose a vulnerability risk assessment method based on the heterogeneous information network. First, we briefly reviewed the heterogeneous  information network and then introduced a ranking method based on the proposed heterogeneous information network, which mainly includes the establishment of the heterogeneous network model and the calculation method, which considers not only the exploitability of a vulnerability and its corresponding impact on the related components but also the impact of that vulnerability on those components in a specific network environment. Second, a method for extending the model is proposed, allowing the model to easily be adapted to changes in the network environment. Finally, a comparison with the CVSS metrics method and two attack graph-based methods were performed to demonstrate the advantages and disadvantages of the proposed method. The experiment simulation results show that the proposed method can more precisely assess the vulnerability risk in a specific network than can CVSS metric-based methods or AssetRank, and our approach provides precision equivalent to the BAG-based method but at a lower computational complexity. Nevertheless, the proposed approach still has several deficiencies. For example, the model does not fuse the rich semantic information to construct a higher-level model, and the vulnerabilities assessed in the test network are not completely up to date. In future works, we plan to perform more empirical and theoretical research to eliminate these drawbacks in this paper; meanwhile, we will continue to concentrate on integrating our method into ontology-based knowledge system and other semantic models for conducting quantitative risk analysis and threat modeling. Table 6 shows the relevant information of vulnerabilities contained in the data set. The column ''Num'' means the sequence number of vulnerabilities. The column ''Host'' means the hosts where the vulnerability exists. The column ''CVE'' means the CVE number of the vulnerability. Column ''Impact Score (IS)'', ''Exploitability Score (ES)'', and ''Base Score (BS)'' means the CVSS metric of the vulnerability. Figure 9 shows how the risk score of each vulnerability changes with the increase of the number of iterations. Table 7 shows the accessible ports of devices. The tuple (a, b, c) in the Table 7 means the host a can access the host b through port c.