A Methodological Framework for AI-Assisted Security Assessments of Active Directory Environments

The pervasiveness of complex technological infrastructures and services coupled with the continuously evolving threat landscape poses new sophisticated security risks. These risks are mostly associated with many diverse vulnerabilities related to software or hardware security flaws, misconfigurations and operational weaknesses. In this scenario, a timely assessment and mitigation of the security risks affecting technological environments are of paramount importance. To cope with these compelling issues, we propose an AI-assisted methodological framework aimed at evaluating whether the target environment is vulnerable or safe. The framework is based on the combined application of graph-based and machine learning techniques. More precisely, the components of the target together with their vulnerabilities are represented by graphs whose analysis identifies the attack paths associated with potential security threats. Machine learning techniques classify these paths and provide the security assessment of the target. The experimental evaluation of the proposed framework was performed on 220 artificially generated Active Directory environments, half of which injected with vulnerabilities. The results of the classification process were generally good. For example, the F1-score obtained by the Random Forest classifier for the assessment of vulnerable networks was equal to 0.91. These results suggest that our approach could be applied for automating the security assessment procedures of complex networked environments.


I. INTRODUCTION
The size and complexity of the technological infrastructures and services being deployed nowadays pose security risks whose assessment is quite challenging. In fact, the layered structure of these environments is frequently characterized by unknown or unexplored dependencies. Similarly, the presence of outdated components and the co-existence of legacy solutions might lead to unpredictable behaviors. Moreover, unexpected events, such as the sudden shift to remote work due to the Covid-19 pandemic, make security risk assessment even more challenging. In fact, the use of remote devices significantly increases the risks and the attack The associate editor coordinating the review of this manuscript and approving it for publication was Inês Domingues .
surface of companies and organizations. In this ecosystem, vulnerabilities, due to software or hardware security flaws, misconfigurations and operational weaknesses, often remain undetected, thus allowing their exploitation for different malicious purposes [1], [2].
The number of Common Vulnerability and Exposures (CVE) publicly disclosed is increasing over the years [3]. For example, the number of CVEs disclosed in the first three quarters of 2022, i.e., 18,828, exceeds by about 500 the CVEs of the entire 2020. 1 Moreover, security attacks often involve the theft of personal or critical information, thus drastically increasing the financial and reputation impacts of these incidents. On average a data breach costs over 4 million USD, 2 while the average remediation costs for a ransomware attack are about 2 million USD. 3 To properly cope with this rapidly evolving threat landscape, regular and timely recognition and understanding of potential security risks should become an integral part of all security mechanisms. Risk assessments are generally very demanding since they are often based on time consuming and error prone manual procedures or on automated tools customized to specific infrastructures, such as power grids.
These issues are the main motivation of our work whose primary outcome is a methodological framework that addresses the compelling need of automating security assessment procedures of complex technological environments, such as Microsoft Active Directory (AD). The choice of these environments -that represent the most prolific technology deployed nowadays by enterprises and organizations -is mainly motivated by their complexity that makes them particularly vulnerable. The proposed framework is based on the combined application of graph-based and machine learning techniques. To the best of our knowledge, this is the first framework that combines these techniques for assessing the security of AD environments.
Graphs are particularly suitable to represent the target environment, that is, the individual entities together with their inter-dependencies, vulnerabilities and misconfigurations. In fact, the evaluation of the relationships between entities is more effective than the isolated evaluation of vulnerabilities of a single entity. From the analysis of the properties of these graphs, the attack paths representing the potential security threats affecting the target are identified. These paths are classified by means of machine learning techniques for obtaining the security assessments of the target. The proposed framework is tested on artificially generated Active Directory environments affected by vulnerabilities.
We outline that our framework plays an important role in the security domain in that it allows administrators of AD environments to identify potentially vulnerable attack paths and fix their vulnerabilities or misconfigurations ahead of the attacks, thus reducing the risks of potential disruptions to the technological infrastructures and services.
The main contributions of this work are summarized by the following items: • Methodological framework for automating the security assessment of Active Directory environments; • Combined application of graph-based and machine learning techniques; • Identification of general features characterizing potential security threats of these environments; • Extensive security assessments of artificially generated Active Directory environments. The organization of the paper is as follows. Section II reviews the state of the art in the area of security risk assessment. Section III presents the AI-assisted method-ological framework proposed for security assessments of Active Directory environments. The setup of the experiments performed to test this framework is covered in Section IV, while Section V focuses on the results of the assessments. Finally, some concluding remarks are presented in Section VI.

II. RELATED WORK
The assessment of security risks affecting technological infrastructures and services has been investigated in the literature under different perspectives (see, e.g., [4], [5] for detailed surveys). This problem is generally very challenging and the solutions are often customized to specific technological environments or tailored to specific security attacks. Some of these solutions exploit graphs, while some others are based on machine learning approaches. Table 1 presents a comparison of our work with the state of the art. This comparison is based on some relevant parameters referring to the target of the security assessment as well as to the techniques applied and to the types of vulnerability considered. As can be seen, our framework is general and applicable to any type of security attack. In addition, it takes into account both misconfigurations and vulnerabilities, i.e., CVEs, affecting the target network.
In what follows, we present details of the state of the art and we outline our advancements.
In this context, to characterize the security risks in Industrial IoT environments, Figueroa-Lorenzo et al. [8] analyze the security of the main protocols, standards, and buses deployed by these environments and propose a vulnerability analysis methodological framework based on CVSSv3.1. Temporal and environmental metrics are complemented by external factors, such as exposure and threat, with the objective of assessing the impact of vulnerabilities on the three cybersecurity pillars, i.e., confidentiality, integrity and availability. Ten et al. [22] propose an analytical framework that provides a measure to systematically quantify the vulnerabilities of Supervisory Control And Data Acquisition (SCADA) systems. The methodology covers three levels, namely, system, scenarios, and access points.
In the area of cloud computing, Saripalli and Walters [21] devise a quantitative impact and risk assessment methodology where risks are defined as a combination of the probability of a security threat event and its severity. Kamongi et al. [9] offer a vulnerability assessment framework that uses an ontology to create a knowledge base populated with a wide range of vulnerabilities, e.g., Common Vulnerabilities and Exposures (CVE), Common Weakness Enumeration (CWE), Common Vulnerability Scoring System (CVSS), stored in the National Vulnerability Database (NVD). To obtain a preliminary evaluation of the security level provided by cloud applications, Casola et al. [14] propose a methodology that takes into account the architecture of the applications and their potential security issues, such as threats, attacks, vulnerabilities and weaknesses.
Unlike these works, our framework is general-purpose and can be applied to assess the vulnerabilities of any complex networked environment consisting of diverse devices based on heterogeneous technologies. This also means that our approach can easily cope with the continuously evolving technological landscape.
The problem of IoT-based smart home security risks is investigated in [13]. In particular, this assessment relies on the operationally critical threat, asset, and vulnerability evaluation methodology. Scores are associated with the potential impacts of security risks. Similarly to this work, we define scores for quantifying to what extent the individual components of the target network are vulnerable. Nevertheless, the scope of our approach is not limited to IoT devices.
Let us remark that most papers model the components of the target network together with their relationships and vulnerabilities by means of the so called ''attack graphs'' (see, e.g., [32] for a detailed taxonomy for attack graph generation and usage).
In the framework of network security assessment, an integrated application of attack graph and Hidden Markov models is proposed in [34], whereas Wu et al. [41] focus on an ontology and graph-based approach. In detail, the ontology represents security knowledge concerning, for example, assets, vulnerabilities, attacks, relationships, and the inference rules for identifying possible attacks.
Several metrics have been defined in the literature for assessing security risks. For example, in [27] the evaluation of the relative security levels of various network configurations is based on two metrics, namely, probabilistic security metric and attack resistance metric. In general, these metrics are obtained by exploring the properties of the graphs and identifying the paths that might allow attackers to compromise an individual resource of the network or even the entire network. A common denominator in the definition of the metrics is the CVSS score. 4 In [12] a combination of the CVSS score associated with a CVE, the attack cost and the attack profit is used to characterize Industrial IoT security scenarios. Similarly, in [26] the CVSS score is the basis of a probabilistic metric that estimates the threat of each path of the graph. Gallon and Bascou [25] define damage metrics associated with hosts and networks. These metrics take into account the characteristics and consequences of the attacks constituting an attack scenario.
We outline that the CVSS score captures the principal characteristics of a vulnerability and produces a numerical score that reflects its severity. Nevertheless, this method often fails to take account of misconfigurations that might be abused by attackers whenever these misconfigurations are not classified as vulnerabilities. Our methodological framework copes with this issue and among the characteristics of the graphs it considers both misconfigurations and vulnerabilities affecting the target network.
For example, Kotlaba et al. [10] focus on the detection of Kerberoasting attacks, a common type of attacks performed in Active Directory environments. The detection of this attack starts from a feature engineering phase based on Microsoft event logs. Data originating from the logs is analyzed and passed as input to a machine learning classifier able to discern regular noise events from actual Kerberoasting attempts.
Artificial neural networks are adopted in [11] for the detection of SQL injection attacks. In particular, the identification of these attacks is based on a combination of different types of neural networks, i.e., LSTMs and MLPs, whose input is represented by URLs. Another interesting neural network model focusing on the web domain is presented in [48]. This work addresses the detection of a family of XSS vulnerabilities known as DOM XSS. More precisely, a bag of words representation derived from Javascript functions is the input of the deep neural network. A deep learning approach is also applied in [7] to detect misconfigured grid devices using operational data of power distribution grids.
In the framework of Software Defined Networks, Cheng et al. [49] propose a machine learning model for deep packet inspection of encrypted and unencrypted traffic. In particular, a binary logistic regression model is applied for identifying malicious payloads in unencrypted packets, whereas decision trees are applied for encrypted packets.
In the context of machine learning based penetration testing, Valea and Oprişa [50] devise an automated platform to assess the security of a host on a network. In detail, the choice of the exploit to be used on the target host is based on the application of decision trees. This approach focuses on vulnerabilities belonging to a single host, whereas it does not consider vulnerabilities associated with the presence of multiple hosts interconnected in a network. Unlike this work, our approach considers both sources of vulnerabilities and it is not customized to any specific type of attack.

III. METHODOLOGICAL FRAMEWORK
The overall architecture of the methodological framework proposed for the security assessment of complex technological environments, such as Active Directory, is shown in Figure 1. This AI-assisted approach is based on the combined application of graph theory and machine learning techniques. As can be seen, the target, consisting of a large variety of heterogeneous devices, is represented as a graph whose nodes correspond to these devices and whose edges represent their relationships. Nodes and edges are in turn characterized by their properties. Moreover, within the graph multiple attack paths, i.e., sequences of adjacent nodes that are potentially vulnerable, are identified. These paths are described by features and classified to obtain the final assessment of the target as safe or vulnerable. We outline that a network is considered vulnerable whenever security loopholes that might need the attention of network administrators have been identified.
We outline that the nodes of the graphs are colored differently to denote the various types of network entities. We also emphasize that the simple graphs presented in this section are examples aimed at supporting the illustration of the proposed methodological framework and as such they do not fully represent the complexity typically found in the networked environments deployed nowadays.
The proposed methodology is based on the following workflow: • Data acquisition: dealing with the collection of data about the target network; • Graph construction: dealing with the encoding of the collected data into a graph; • Attack paths extraction: dealing with the identification of the sequences of adjacent nodes that represent potential threats; • Feature engineering: dealing with the identification and selection of the features that characterize the attack paths; • Classification: dealing with the final assessment of the target network. Details of the various stages are provided in what follows.

A. DATA ACQUISITION
Data acquisition consists in making the inventory of the entities (e.g., users, groups, computers, printers, routers) belonging to the target network. This inventory, typically built using automated enumeration tools, includes the list of entities together with their properties and physical and logical relationships as well as the vulnerabilities and misconfigurations affecting individual entities. Examples of properties enumerated for network devices refer to the type and version of the operating system, the processor architecture, the firmware version, the open port numbers with the associated services. Similarly, the users being enumerated are described by properties such as personal details, privileges, last login date and time.
Enumeration tools are also useful for extracting the relationships between entities. In particular, these relationships refer to the physical connections between devices and to the logical connections derived from the properties of the various entities. For example, a user with an account on a specific device has a logical relationship with that device. A network printer shared by several computers has a physical relationship with them.

B. GRAPH CONSTRUCTION
Graph construction consists in encoding the enumerated target network into a directed graph model. Graphs are very useful for threat modeling since they highlight non-obvious relationships between network entities. Let us recall that a directed graph G is a pair (V (G), E(G)) where V (G) is the set of vertices (or nodes) of the graph and E(G) is the set of In our framework, the nodes of the graph correspond to the network entities previously enumerated, while their physical and logical relationships are represented as edges between nodes. Each node is described by the properties of the network entity it represents and by topological properties such as centrality measures. Moreover, edges between nodes  can be characterized by weights obtained by combining the properties of the nodes and the types of relationships. Figure 2 shows an example of a graph modeling a target network with nine entities and various types of relationships. The entities refer to users, groups, computers and a server. The relationships between these entities are modeled by the edges. More precisely, the figure shows relationships of users belonging to groups, a group that can access computers, a computer and a server that store user credentials and a user who is the server administrator. Figure 3 shows a little sample of possible properties associated with the server of Fig. 2. We outline that servers are generally described by many diverse properties related, for example, to their basic characteristics and security services as well as to the services being offered and the relationships with neighbor network entities. The server properties listed in the figure refer to its operating system, i.e., Microsoft Windows Server 2019, its processor architecture, i.e., Intel x86-64, and the supported authentication mechanism, i.e., Kerberos. In addition, the server exposes three services, i.e., https, rdp and smb, and is characterized by one incoming and one outgoing edge, that is, the node has a degree equal to two.

C. ATTACK PATHS EXTRACTION
The analysis of nodes and edges of the graph and of the corresponding properties is the basis for identifying the potential security threats of the target network. This analysis aims at extracting the attack paths, that is, potentially vulnerable paths that could be exploited by attackers to compromise some specific network entities.
Let us recall that a path X on a graph G is a non-empty graph consisting of a sequence of non-repeating adjacent nodes and edges such that V (X ) ⊆ V (G) and E(X ) ⊆ E(G). The path length corresponds to the number of edges in X .
Given an origin node, e.g., a compromised node, and a destination node, e.g., the target of the attacker, different approaches can be applied to extract attack paths. For example, attack paths could correspond to the shortest paths between the nodes, that is, the minimum number of nodes to be traversed, thus prioritizing how close nodes are. For weighted graphs, attack paths could corresponds to the paths of the minimum weight between the nodes, thus prioritizing specific characteristics of the graph. Attack paths could also be identified by applying heuristics that take advantage of the domain knowledge of the technological environment under investigation. For example, paths could include nodes and edges characterized by specific relationships that make them particularly vulnerable. Figure 4 shows the path of length four between User1 and Group3 extracted from the graph of Figure 2. This attack path is potentially vulnerable because of the relationships between nodes. In fact, the membership of User1 to Group1 grants the access to Computer1. In addition, Computer1 stores the credentials of User2. Hence, User1 could retrieve these credentials and impersonate User2, thus reaching Group3 and performing potential privilege escalation. This example has shown that the security risks associated with the extracted paths mainly depend on the properties of the nodes and on their relationships. Of course, the consequences and impacts of this attack depend on the privileges associated with Group3.
In what follows we denote the set of attack paths from a given origin node to the destination node as attack graph.

D. FEATURE ENGINEERING
The security assessment of the target network requires the identification and engineering of features describing the properties of the nodes and edges of individual attack paths and of the corresponding attack graph.
More precisely, these features should capture the potential vulnerabilities of nodes and edges within attack paths (e.g., presence and number of network entities running obsolete operating systems, storing passwords in clear-text or enabling remote access).
Other features could be related to the structural properties of individual attack paths (e.g., path length, total weight of the path) or of the corresponding attack graph (e.g., number of attack paths, clustering coefficients, transitivity).
Features significantly affect the classification process and the outcome of the security assessment. Hence, feature engineering, e.g., selection, scaling and aggregation of the identified features, is a crucial task that requires a solid technological and security background.

E. CLASSIFICATION
The classification of the identified attack graphs is the basis of the security assessment of the target network. For this purpose, a classical machine learning approach consisting of three phases, namely, training, validation and testing, is applied. In particular, training deals with learning how to distinguish between vulnerable and safe attack graphs, whereas validation and testing deal with the tuning and evaluation of the classification process. This process is based on different classification algorithms, such as Logistic Regression, Support Vector Machines, Decision Trees, K-Nearest Neighbors, that differ for their learning strategy and computational complexity [51].
Since the performance of the classification process heavily depends on the identified features, features might be reengineered multiple times to obtain an accurate security assessment.

IV. EXPERIMENTAL SETUP
This section presents the setup of the experiments performed to test the proposed methodological framework on Active Directory environments. As already pointed out, the complexity of these environments makes them highly vulnerable, thus requiring accurate and timely security assessments [52]. In addition, we believe that Active Directory environments are good representative of complex networked scenarios that might benefit of automated security assessment procedures.
In what follows, we describe the main characteristics of the environments considered in our investigation and discuss the choices made in the various steps of the methodology as well as the corresponding implementation details.

A. GENERATION OF ACTIVE DIRECTORY ENVIRONMENTS
Before presenting the characteristics of the Active Directory environments being tested, we briefly introduce the main components and services offered by these environments.
Active Directory is a set of technologies developed by Microsoft to implement directory services aimed at managing complex computer networks [53], [54]. A directory is a hierarchical structure that stores information about objects on the network. According to Active Directory terminology, objects refer to users, computers, groups, organizational units, services and even to network policies. Moreover, the hierarchical organization of objects includes domains, i.e., collections of objects, and forests, i.e., collections of domains.
Active Directory services offer the methods for storing and retrieving directory data. This data is essential for the proper functioning of the entire network and in particular for many fundamental services, such as authentication and authorization.
The implementation of Active Directory services is based on specialized servers, known as Domain Controllers, often used for the centralized configuration and management of the network.
The methodological framework is tested on artificially generated Active Directory environments since, to the best of our knowledge, no datasets of Active Directory environments are publicly available. In fact, companies and organizations are not willing to disclose any detail about their technological infrastructure and internal organization due to confidentiality and security issues as well as to competitive reasons [44]. In addition, Active Directory environments typically consist of hundreds of network entities characterized by a complex hierarchical organization, thus the setup of technologies representative of realistic scenarios is very expensive and cumbersome.
For the generation of realistic AD environments, we consider the different types of network entities that are typically part of these environments, e.g., users, groups, computers, organizational units, domain controllers, and we describe each entity by properties and relationships, e.g., authentication and delegation mechanisms, membership, access control and trust relationships, group policy management. Some of the properties might refer to various types of misconfigurations inadvertently caused by system administrators, such as incorrect access control list settings.
To promote the diversity of the environments being generated, we associate a probability distribution with each property and relationship. For example, to choose the access control right of a given entity, we assign a probability to each possible right, e.g., GenericAll, AddMember, WriteDacl, and we sample the corresponding distribution. Note that some properties and relationships might lead to the generation of entities affected by vulnerabilities (e.g., zerologon) or misconfigurations (e.g., unconstrained delegation, non-expiring passwords) that might allow attackers to compromise the network. For these reasons, the generated AD environments are particularly suitable for testing our methodological framework.
It is also important to mention that some specific characteristics of the generated AD environments, such as the number of Domain Controllers, are chosen according to heuristics derived from the best practices suggested by Microsoft [55], [56]. For example, for reliability reasons it is recommended to include multiple Domain Controllers in a domain. For customizing the configuration policies, authorizations granted to users should take into account their role and responsibilities. Similarly, users and computers should belong to different organizational units. For our experiments we generate in total 220 artificial AD environments. On average an environment consists of eight Domain Controllers managing 103 users and 120 computers subdivided into 59 groups and belonging to 41 organizational units. Moreover, an average of 31 configuration policies are associated with Domain Controllers.

B. ATTACK GRAPH CONSTRUCTION
The generated Active Directory environments are modeled through the use of graphs. Within these graphs, three main categories of relationships have been identified according to their purpose, namely: • Access relationships describing access capabilities of network entities; • Authorization relationships describing the permissions of network entities; • Hierarchical relationships describing the hierarchical organization of the network entities. Under specific circumstances these relationships might be exploited, thus becoming critical for the security of the network.
To quantify the degree of vulnerability of the network entities, we assign scores to the properties associated with nodes and to their relationships. These assignments require solid knowledge of Active Directory environments as well as deep experience in the security domain. The choice of the values of these scores is driven by the characteristics of the nodes and the potential interest of an attacker towards the specific node, e.g., domain administrator, unprivileged user. In particular, higher scores are associated with a higher degree of vulnerability. In general, scores are customized to the networked environment being tested. Scores might change as a consequence of newly discovered vulnerabilities affecting network entities or existing vulnerabilities being patched.
Not to clutter the presentation, Tables 2 and 3 present some examples of scores for node properties and relationships, respectively. Note that the choice of the values assigned to these scores takes into account our experience in the Active Directory domain. As can be seen, the Belongs to relationship, referring to users being members of groups, has a lower score with respect to the Administers relationship referring to users with the role of server administrators (see Table 2).
Moreover, the scores assigned to computers running old operating systems, such as Fedora 16 or Microsoft Windows XP, differ significantly because of the different degrees of vulnerability affecting these operating systems (see Table 3). On the contrary, computers running modern operating systems, such as Microsoft Windows 10 or Ubuntu 22.10, are less vulnerable because of the regular updates and patches released by the developers. Hence, their score is set to the minimum value, i.e., zero.
The potential vulnerability of the node N i is summarized in terms of its overall score ν(N i ), that is, the sum of the scores assigned to the properties of the node and to the relationships of its outgoing edges.
The overall score is used as the basis for assigning weights to edges. In detail, the non-negative weight w(e ij ) associated with edge e ij connecting node N i with node N j is defined as follows: To better identify the shortest paths, the weights are defined as inversely proportional to the overall score. Moreover, the logarithmic transformation is applied to reduce skewness, while one is added to ν(N i ) to avoid a division by zero. Note that weights are also interpreted as costs associated with edges.

C. IMPLEMENTATION
The technology used to store and analyze these graphs is a graph-based database platform, namely, Neo4j, 5 that relies on a de-facto standard language for graph querying, i.e., cypher. 6 This kind of database is very efficient for storing data related to graphs. In fact, storage and computation requirements are very limited. For example, a node of a graph can be stored in as little as 15B, while an edge requires 34B. Moreover, the encoding of the generated environment into a graph is very fast. In fact, a laptop with a Intel Core i7-6600U CPU running at 2.6GHz with 8GB of RAM, takes less than three minutes to construct a graph consisting of about 400 nodes and 4, 000 relationships.
To extract attack paths from each graph we create simple cypher queries where we select as origin node a nonprivileged user and as destination nodes privileged users, such as domain administrators. Note that a fast bidirectional breadth-first search algorithm is used if the predicates can 5 https://neo4j.com/ 6 https://opencypher.org/ be evaluated whilst searching for the path, if not, the slower exhaustive depth-first search algorithm is used.
An example of a query that computes the shortest paths from User1 to Group3 of the domain administrators is reported in Figure 5. These paths refer to the minimum number of nodes from the origin to the destination. A similar query is created to extract the path with the minimum weight identified by using the Dijkstra's algorithm.
As a result of these queries, we identify 220 attack graphs consisting on average of six attack paths with seven nodes each. We recall that an attack graph is the set of all the attack paths extracted from the graph constructed for the generated AD environment (see Fig. 4 for an example). The paths correspond to the nodes to be traversed, starting from a given origin node, to compromise a target destination node, We recall that these graphs are particularly important since they highlight non-obvious chained misconfigurations that could be exploited by attackers.
The average weight associated with the weighted attack paths is rather small, that is equal to 0.13. In fact, according to our formulation, the weights are strictly positive and do not exceed 0.69 corresponding to log(2) and our objective is to find the path of minimum weight, i.e., cost.
Note that each path has been labeled manually as vulnerable or safe depending on whether it could be exploited by an attacker to compromise the network. This labeling process is based on the analysis and visual inspection of the corresponding sub-graphs. In detail, paths are labeled by analyzing the properties associated with the nodes of the paths and by looking at the relationships between nodes. For example, the relationships between the nodes of the attack path shown in Figure 4 make it vulnerable. In fact, by leveraging the credentials of User2, User1 could reach Group3 and perform potential privilege escalation. It is important to emphasize that the labeling process requires a solid knowledge in the security and Active Directory domains.

D. MACHINE LEARNING CLASSIFIER SETUP
As already pointed out, to discriminate between vulnerable and safe attack graphs, the extracted features should summarize the characteristics of these graphs and capture the potential vulnerabilities of nodes and relationships. In particular, features are associated with the frequency of each type of node and each type of relationship within a path as well as with the frequency of nodes and relationships considered critical from a security perspective.
Of these features, 26 refer to the shortest paths and 26 to the weighted paths. In summary, these features belong to the following categories: • Type of node: 6 features; • Nodes with critical properties: 6 features; • Non-Critical Access Relationships: 12 features; • Non-Critical Authorization Relationships: 14 features; • Non-Critical Hierarchical Relationships: 8 features; • Critical Relationships: 6 features. We also consider features referring to the total number of nodes and relationships in the identified paths, to the number of paths and to the corresponding assessment metric. In total we obtain 60 features.
In what follows, we present examples of the types of nodes and relationships considered to extract the features. Additional details can be found in [57].
Some features refer to the various types of nodes included in the attack graphs, such as users, computers and groups, as well as to the nodes critical from a security point of view, such as domain controllers, privileged users and groups, computers running obsolete operating systems or with unconstrained delegation enabled. In the context of relationships, features are associated with access control relationships, such as GenericWrite and GenericAll, remote access relationships, such as CanRDP and CanPSRemote, and critical relationships, such as AddMember, HasSession and AdminTo.
In general, the features related to critical properties and relationships are particularly relevant to differentiate safe and vulnerable attack graphs. For example, the feature related to the number of computers running old operating systems is good for this purpose. In fact, a path that includes a large number of these computers is potentially more vulnerable than a path that does not include any or includes very few.
We outline that the some of the features selected for describing the attack graphs are not specifically customized to AD environments, thus they could be used to characterize the potential security risks of other networked environments.
To reduce the problem dimensionality, we create for each group of relationships a new feature obtained by counting the number of relationships within the group. Moreover, we compute the coefficient of correlation between features and we discard highly correlated ones, that is, features whose coefficient of correlation is above 0.9. As a result, we reduce the 60 extracted features to 12 features only, namely, two features referring to the shortest path, that is: • Number of computers running old operating systems; • Overall cost associated with the shortest path; and ten features referring to the weighted shortest path, that is: • Number of relationships related to access control mechanisms; • Number of nodes representing hierarchical grouping; • Number of relationships related to remote access capabilities; • Overall cost associated with the weighted shortest path. Popular algorithms, i.e., Logistic Regression, Support Vector Machine and Random Forest, are applied for the classification of the attack paths. Note that these classifiers and the corresponding machine learning models are used in these experiments as a proof of concept of the proposed framework. In facts, our main objective is to assess their ability to classify attack graphs as either vulnerable or safe, rather than identifying the best classifier for the data at hand.
The main choices associated with the classification process are summarized as follows: • 12 features; • Classical 80/20 split of the dataset; • k-fold technique with k = 10 for the validation of the classifiers; • Grid search for hyper-parameter tuning. The implementation of the entire classification process is based on Python3 and in particular on the pandas and scikit-learn modules.

V. EXPERIMENTAL RESULTS
The application of the classification process provides the security assessments of the Active Directory environments described in Section IV-A. Not to clutter the presentation, this section discusses the main results of this process. Additional details are provided in [57]. Let us recall that the dataset used in the experiments is balanced and consists of 110 safe attack graphs and 110 vulnerable attack graphs. Each graph is described by 12 features.

A. HYPER-PARAMETER TUNING
The tuning of hyper-parameters relies on a grid search customized to each classifier. For the Logistic Regression (LR) classifier, the grid search is based on two hyperparameters, i.e., regularization C and penalty (see Table 4).
The accuracy corresponding to the tested hyper-parameters is shown in Figure 6. As can be seen, the results of the validation suggest that the best values of the regularization parameter C and of the penalty are 1 and L1, respectively. In fact, the resulting accuracy on the testing dataset is equal to 84.1%.
Similarly, the grid search of the Support Vector Machine (SVM) classifier is based on two hyper-parameters, namely, VOLUME 11, 2023    regularization C and kernel type (see Table 5). The accuracy corresponding to these hyper-parameters is shown in Figure 7. The best performance using SVM during validation is obtained by setting C = 1 and using a linear kernel. The resulting accuracy on the testing dataset is 86.3%. Finally, for the Random Forest (RF) classifier, the grid search is based on two tuning parameters, namely, number of decision trees used within the forest and maximum tree depth (see Table 6).
The accuracy corresponding to the tested hyper-parameters is shown in Figure 8. The grid search suggests that the best performance is obtained with a number of decision trees equal to 100 and a maximum tree depth equal to 3. The resulting accuracy reaches 91% on the testing dataset. Table 7 summarizes the performance of the three classifiers applied for the security assessment, that is, to assess whether the generated Active Directory environments are safe or vulnerable. The performance, expressed in terms of precision, recall and F1-score, refers to the testing dataset.

B. PERFORMANCE COMPARISON
We notice that RF classifier consistently outperforms the other classifiers, although the performance of all classifiers  is generally good. For example, the F1-score ranges between 0.80 obtained by the LR classifier for the assessment of safe networks and 0.91 obtained by the RF classifier for the assessment of vulnerable networks. Similarly, the precision and the recall are always greater or equal to 0.74 and 0.82, respectively.
In summary, these results suggest that the combined application of graph models and machine learning techniques is very useful for automating the security assessment procedures of complex technological environments, such as Active Directory.

VI. CONCLUSION
In this paper we addressed the problem of security assessment of Active Directory environments because their complexity makes them particularly vulnerable. In particular, to evaluate whether a target is vulnerable or safe, we proposed an AIassisted methodological framework based on the combined application of graph models and machine learning techniques. More precisely, from the graphs describing the target, attack paths representing its potential security threats are extracted. The classification of these paths by means of machine learning techniques provides the security assessment of the target. For the experimental evaluation of the proposed framework we focused on 220 artificially generated Active Directory environments, half of which affected by vulnerabilities. The results of the classification process were generally good. For example, the F1-score obtained by the Random Forest classifier for the assessment of vulnerable networks was equal to 0.91. These experiments suggested that our framework could be applied for automating the security assessment procedures although it might require an initial human intervention in the definition of the scores associated with nodes and relationships of the target being tested.
Future works will be dedicated to analyze the sensitivity of the proposed approach with respect to the scores assigned to nodes and relationships and to the evolving threat landscape. Moreover, we plan to characterize the attack graphs in terms of additional features and investigate their impact in the security assessment process.