Using Knowledge Graphs to Unlock Practical Collection, Integration, and Audit of AI Accountability Information

To enhance trustworthiness of AI systems, a number of solutions have been proposed to document how such systems are built and used. A key facet of realizing trust in AI is how to make such systems accountable - a challenging task, not least due to the lack of an agreed deﬁnition of accountability and differing perspectives on what information should be recorded and how it should be used (e.g., to inform audit). Information originates across the life cycle stages of an AI system and from a variety of sources (individuals, organizations, systems), raising numerous challenges around collection, management, and audit. In our previous work, we argued that semantic Knowledge Graphs (KGs) are ideally suited to address those challenges and we presented an approach utilizing KGs to aid in the tasks of modelling, recording, viewing, and auditing accountability information related to the design stage of AI system development. Moreover, as KGs store data in a structured format understandable by both humans and machines, we argued that this approach provides new opportunities for building intelligent applications that facilitate and automate such tasks. In this paper, we expand our earlier work by reporting additional detailed requirements for knowledge representation and capture in the context of AI accountability; these extend the scope of our work beyond the design stage, to also include system implementation. Furthermore, we present the RAInS ontology which has been extended to satisfy these requirements. We evaluate our approach against three popular baseline frameworks, namely, Datasheets, Model Cards, and FactSheets, by comparing the range of information that can be captured by our KGs against these three frameworks. We demonstrate that our approach subsumes and extends the capabilities of the baseline frameworks and discuss how KGs can be used to integrate and enhance accountability information collection processes


I. INTRODUCTION
The increasingly widespread adoption of AI systems has been spurred by their stated advantages and benefits, including automation of labour, reduction in error rates vs. humans The associate editor coordinating the review of this manuscript and approving it for publication was Chang Choi .
performing the same tasks, and always on availability. However, such systems can also suffer from a range of drawbacks, including biases which result in perpetuating racism and sexism [1]- [8], the production of erroneous or unintended results [9]- [11], and in extreme cases, incidents that result in direct or indirect harm to humans [12], [13]. Such limitations remain the subject of an active academic debate [14]- [17], VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ and there are emerging efforts to track incidents related to AI-powered technologies, such as the AI Incident Database. 1 A variety of solutions have been proposed to address the perceived limitations of AI systems; these have originated from regulators and policy makers [18]- [21], professional bodies [22], [23], and developers and researchers [24]- [27].
A key facet of realizing trust in AI is how best to make such systems accountable -a challenging task, not least because of the lack of a universally agreed definition of accountability.
Despite emerging academic debates about treating AI systems as self-aware entities and therefore as legal agents [28], we base the approach outlined in this paper on the assumption that AI systems, algorithms, or machines should not be treated as legally responsible or liable agents [29], [30], and that as a result, human agents or organizations that created or influenced the behaviors of AI systems should be accountable. As a consequence, we argue that AI accountability must be supported by the ability to inspect, review, or otherwise interrogate such a system with the goals of (i) making processes associated with each of its life cycle stages transparent [19], [22]- [24], [26], [31]; (ii) demonstrating compliance with hard laws (i.e., laws and regulations) and soft laws (i.e., standards and guidelines) [19], [26]; and (iii) aiding investigations into the cause(s) of failure or erroneous decisions and supporting the identification of responsible parties [19], [22], [24], [26], [31]. The various technologies that could enable such comprehensive inspection, revision, or interrogation are often explored by the computer science (CS) community and professional bodies under separate sub-themes [32], such as transparency and traceability [23], [33], [34], intelligibility/interpretability [23], [34], explainability [31], [34], and auditability/reviewability [35]. But due to the socio-technical nature of most AI systems, these themes are also explored by lawyers, regulators, and social sciences researchers [18], [19]. This often leads to a ''mismatch'' of expectations and proposed solutions. For example, proposed laws and regulations may be perceived as technically unfeasible and lacking in providing clear means for implementation, e.g., GDPR [36] requires the provision of ''meaningful information about the logic involved'' without specifying what 'meaningful information' is. Moreover, technological solutions proposed by the CS community may be difficult to interpret and may lack crucial evidence (e.g., information about involved human actors) in order to be usable in legal proceedings. Further, there is a stance within the CS community that calls to make AI systems more transparent by exposing their source code would, in many cases, hinder rather than help accountability, as such transparency is often not meaningful, because it requires significant efforts and skills to interpret as well as posing other challenges (e.g., those related to the protection of intellectual property rights) [33], [37].
We argue that solutions which aim to support accountability in AI systems which are also enforceable by law must be 1 https://incidentdatabase.ai/ comprehensive and cover all life cycle stages of such systems, and at each stage must identify individual human agents that bear responsibility for decisions and outcomes (e.g., implemented system components) influencing the design, implementation, deployment, and operation of these systems. In this paper, we refer to this type of information as accountability information. We further argue that this accountability information must be meaningful, semantically annotated and ideally understandable by both humans and machines. The effort needed to collect and to interpret such data in case of an investigation is likely to be a significant burden on many stakeholders, and the ability of machines to understand and process the data is important in order to achieve a certain level of automation.
Our proposed approach is compatible with existing solutions such as Datasheets [38], Model Cards [39], and Fact-Sheets [40]. These frameworks provide guidelines on what information to record about various aspects of AI systems (e.g., descriptions of datasets, machine learning (ML) models, performance evaluations, etc.) to improve transparency of such systems. Datasheets and FactSheets provide questions and checklists that users can refer to in order to record free text and images describing datasets or AI systems. While examples for different types of systems are provided, it is ultimately up to the end users of these frameworks to decide which questions are relevant in the context of their AI systems and the final result is presented in the form of a visual report designed for human inspection. While this simplicity provides certain benefits (e.g., a low barrier to the production and distribution of such reports), the lack of structured, semantically annotated data makes it difficult for computer systems to assist with creating and querying such reports in an automated or semi-automated manner. For example, it would be difficult for software to check if some information that was intended to be collected (e.g., answer to some critical question) was indeed provided. Model Cards do provide means to assist developers with the collection of required information via their Model Card Toolkit, 2 which produces JSON representations of Model Cards; however, interpretation and querying of such data again relies on visual inspection by humans.
In our approach, the term AI system refers to software comprising 'core AI' components (e.g., an ML model) and other supporting functions (e.g., API wrappers) [41] allowing it to function either as a standalone solution, or as a part of a larger system. We consider the development and use of an AI system in terms of four high-level life cycle stages: Design, Implementation, Deployment, and Operation; this conforms to the recommendation by Amershi et al. [42] that standard software engineering practices should apply to such systems. The design stage covers the aspects associated with designing the AI system; the implementation stage includes the processes associated with building and testing the system; the deployment stage includes the processes related to installing the system and, if applicable, configuring and integrating it with other systems, producing documentation, and training users. Finally, the operation stage consists of the actual use of the system and (routine) monitoring. This provides a broader scope for recording accountability information, which extends the current state of the art. Modelling accountability information is supported through two ontologies, SAO 3 and RAInS. 4 These ontologies are built around three main concepts: accountability plans, accountability traces and accountable agents. Accountability plans specify information that should be captured throughout the system's life cycle and guide the recording of corresponding accountability traces. We argue that the ability to plan the collection of accountability information is crucial for scalability of such data capture and to ensure that the required information is not lost (e.g., in order to save sensor observations in an autonomous vehicle this functionality must be planned for and implemented prior to an incident). The actual descriptions of activities that occurred during the different life cycle stages, their inputs and outputs and relationships to accountable human agents are captured in corresponding accountability traces. SAO and RAInS extend the W3C PROV-O [43] standard for representing provenance as causal graphs consisting of three core concepts: entities, which describe any real or imaginary thing; activities, which represent processes that use and generate entities; and agents, which represent human, organizational, or software actors that can bear some responsibility for an activity. The ontologies are also based on the EP-Plan [44] ontology, which extends PROV-O with the ability to describe abstract acyclic plans represented as a series of steps and input and output variables. The capture and audit of accountability information is supported by a prototype implementation of the Accountability Fabric, a suite of tools demonstrating the feasibility of a computational solution to support planning, collection, and inspection of accountability information using SAO and RAInS. For technical details describing the ontologies and the Accountability Fabric, we refer the reader to [45], [46], which demonstrate the application of this approach to documenting the design stage of an AI system.
In this paper we present the following contributions: 1) A set of competency questions describing types of accountability information relevant to the design and implementation stages of an AI system, and which goes beyond the information recorded by the current Datasheet, Model Card, and FactSheet frameworks. 2) The extended RAInS ontology for modelling accountability information as knowledge graphs (KGs) which covers the design and implementation stages of the AI system life cycle. 3) An evaluation of our approach by comparing its capabilities with those of Datasheets, Model Cards, and FactSheets.
The remainder of this paper is organized as follows: Section II describes a set of competency questions which define what accountability information needs to be recorded; Section III describes our approach for semantically modelling such accountability information; Section IV describes results from a comparison of our solution to the three popular nonsemantic frameworks; Section V discusses the results of the comparison and outlines the benefits of our approach; and Section VI concludes the paper and discusses future work.

II. ACCOUNTABILITY INFORMATION COVERAGE
Competency questions (CQs) are expressed in natural language and are commonly used in semantic modelling to identify individual ontological concepts (e.g., types of objects and their relationships) and to evaluate the resulting ontology (i.e., the ontology must describe information that can sufficiently answer CQs) [47]. In this section, we summarize our approach for creating the CQs relating to accountability of AI systems, and provide an overview of their thematic structure. It should be noted that in RAInS we focus on AI systems built around ML technologies. The content presented here includes questions relating to the system design stage (previously discussed in [45]) and those relating to the implementation stage of an AI system. While these two stages are closely coupled and hence many CQs overlap, one of the main differences is that the design stage includes descriptions of the design specifications of the system components, whereas the implementation stage includes descriptions of the tangible system components based on such specifications, as well as justifications of any deviations from the original design. For example, the design stage includes the specification of how an ML model is to be built and evaluated, whereas the implementation stage includes descriptions of the ML model after it has been realized, and the CQ 'What are the limitations of the ML model?' can be read in each case as: 'What are the limitations of the designed ML model?' and 'What are the limitations of the realized ML model?'.

A. METHODOLOGY
We gathered CQs relating to accountability of AI systems from publications by regulatory bodies [19], [48], statements and guidelines released by professional bodies [22], [31], [49], and the academic literature [24], [26], [38]- [40], [50] up to the year 2020. The questions included documentation requirements for AI system components (e.g., ML models) as well as explanations of how automated decisions were made. While the literature offers a comprehensive set of questions, these often focus on technological details of AI systems, their individual components (e.g., ML models) and resources used in the development process (e.g., datasets), often without explicit links to human agents who could be held accountable for their existence. The initial set of CQs was analyzed and similar questions (such as those addressing the same issue but worded differently) were grouped under overarching thematic CQs which represented those questions. For example, the two requirements to indicate ''whether the model was developed VOLUME 10, 2022 with general or specific tasks in mind'' [39] or ''What are the goals, purposes and intended applications of the product or service?'' [49] were translated to the CQ: 'What are the intended use cases of the ML model?' Each thematic CQ was then analyzed to determine the ontological concepts needed to represent that information. For example, IntendedUse-Case was one of the concepts that emerged from this CQ.
Having completed the initial CQ analysis exercise, we next turned our attention to a number of AI application use cases (autonomous vehicles, image classifiers for cancer detection, and loan application approval systems), with the objective of identifying any missing knowledge requirements. We created additional CQs in response to these use cases which extended the discussion of AI accountability in the current literature; we then determined the ontological concepts necessary to represent the information in the new CQs. The CQs identified through this exercise that were not included in the initial set of questions were as follows: • Whether the implementation or evaluation followed the design specification, and if not, what the justification was for the deviation from the specification.
• Who performed each accountable action and was therefore responsible for the generated accountable result(s).
• What were the compliance requirements for the components of the AI system.
• Who checked that the components of the AI system comply with relevant hard and soft laws.
• Who checked that the components of the AI system are fit for purpose, i.e., its intended use case(s).
• Who approved, i.e., signed off on, the components of the AI system.
• What were the guidelines for deploying and operating the AI systems (to be disseminated to the persons responsible for deploying the AI system).

B. COMPETENCY QUESTIONS TOPICS
The final set of CQs that framed the scope of our knowledge representation models contained 127 questions. We organized and grouped these CQs under a number of high-level topics as shown below. Where a topic relates to the Design stage, it is indicated by the letter (D), to the Implementation stage by the letter (I), and to both Design and Implementation stages by the letters (D&I). References are provided if questions relating to particular aspects of a topic came from a third-party resource. Where no reference is provided, these were created by our team as outlined in the previous section.
1) System-level information (D): the intended purpose of the system [19], [39], [40], [50]; the intended users of the system [31], [39], [40], [48]; the compliance specifications which apply to the system, i.e., the hard laws that must be followed and soft laws that should be followed [19], [48], [49]; and who is accountable for the creation of the specifications of the system purpose and the system compliance requirements.

III. SEMANTIC MODELLING OF ACCOUNTABILITY INFORMATION
This section provides a high-level overview of our proposed approach for documenting accountability information as part of a single Knowledge Graph (KG). For more technical details on how these ontologies were formalized and how complete KGs described using SAO and RAInS should be represented (i.e., using the appropriate relationship links between different class instances), we refer the reader to [45] and the ontology documentation. As outlined in Section I, we use SAO and RAInS ontologies to semantically model information about the AI system life cycle in the form of accountability plans and their corresponding accountability traces. The ontologies extend the W3C recommendation PROV-O [43] and its extension EP-Plan [44]. The resulting KG described using these ontologies is thus expressed as an acyclic causal provenance graph.

A. ACCOUNTABILITY PLANS
Accountability plans describe information about activities and their outcomes that are expected to occur during a system life cycle and should be recorded for accountability purposes. Fig. 1 illustrates a portion of an accountability plan for the implementation life cycle stage. The plan captures the expected flow of events describing part of a standard machine learning pipeline where a dataset is split into training and evaluation datasets, and then an ML model is trained on the training dataset and evaluated using the evaluation dataset. Activities such as dataset splitting and ML model training are described as accountable actions and the results that they produce are described as accountable results. This high-level vocabulary is defined by the SAO ontology which also provides concepts for linking the individual plan components to a particular life cycle stage, identifying which accountable agents may perform certain activities (as described in the plan), linking the information to a particular AI system, etc.

B. ACCOUNTABILITY TRACES
Each accountability plan is linked to one or more corresponding accountability traces. These contain the information about the events and outputs described in the plan as they actually happened in the real world. Fig. 2 illustrates part of an accountability trace capturing information about training an ML model (Training Activity with start and end times) and who is accountable for this activity (user Alice). Accountable results are described in the accountability trace as collections of information elements, each describing a different aspect of the result (such as limitations, license). In Fig. 2 Model Component Information generated by Training Activity has information element members Performance Limitation and Gender Bias. To draw an analogy with paper-based records, the information realization represents a report describing an outcome of an activity (e.g., an implemented ML model) and the information elements represent individual sections of the report. Note that in our approach we do not attempt to provide a semantic annotation for every concept that may be used to express ML model limitations. Instead, the KG would contain human readable text linked to the information element describing the limitation.

C. EXTENDING THE HIGH-LEVEL ACCOUNTABILITY VOCABULARY
To maximize the applicability of SAO to different accountable systems, the ontology only defines high-level concepts for modelling accountability plans and accountability traces. This limits the types of queries that can be answered over a KG and makes interpreting the results of those queries a challenge for human users. For example, referring to the example in Fig. 2, we could construct a query that lists all the information elements describing a particular accountable result (e.g., ML Model), but we would be unable to select only those elements describing limitations. In other words, we can retrieve the sections of the report, but do not know what is described in each section. As our aim was to model accountability information relating to AI systems built around ML technologies, we extended SAO with the RAInS ontology to define subclasses (i.e., subtypes) of some of the core classes including accountability plan, accountable action, accountable result, and information element. Fig. 3 illustrates the SAO classes (shown in bluefilled rectangles) which RAInS extends. Third-party classes which were reused from ML Schema 5 [52], Dublin Core 6 (DC), and OntoSoft 7 [53] have blue borders. For the SAO and RAInS properties, we refer the reader to the ontology documentation.
Design Stage Accountability Plan and Implementation Stage Accountability Plan provide 'containers' for plan definitions and both the design and implementation life cycle stages of an AI system are expected to have one of these   plans defined. To provide more detailed descriptions of these accountability plans, subclasses of SAO's AccountableAction were defined to describe planned activities such as: (1) designing the components and the system's high-level specifications (see Produce Specification); (2) creating new tangible system components/assets (e.g., trained ML models, datasets) following their design specifications (Realize Component and Merge and Split Dataset); (3) evaluation of the individual components, as well as the AI system as a whole (Evaluate); (4) generating guidelines for deployment and operation (Generate Guideline); and (5) making decisions (Decide). Furthermore, subclasses of SAO's Accountable Result describe: (1) design specifications for the AI system and its components (Design Specification and its subclasses); (2) implementations of individual system components (Dataset Component -and its subclasses, Model Component, and Supporting Infrastructure Component); (3) evaluations of the ML model and the AI system as a whole (Evaluation); (4) generated guidelines (Guideline and its subclasses); (5) human decisions (Human Decision and its subclasses); (6) logs of any deviations from design specifications (ChangeLog).
RAInS supports richer descriptions of accountable results in accountability traces through subclasses of SAO's Information Element. These represent different types of accountability information such as limitations, references to hard and soft laws, user groups, use cases, risks, ML model and dataset characteristics, guidance, data collection procedure, 74388 VOLUME 10, 2022 etc. Fig. 4 shows a portion of the example accountability trace from Fig. 2 enhanced with RAInS concepts.
Finally, the RAInS ontology also defines two types of constraints: AutoConstraint and HumanConstraint. The former represents constraints associated with accountable actions that can be implemented (e.g., in the form of a rule) and automatically evaluated against the accountability trace (e.g., a constraint to check whether a particular information element was created or that the value of an information element is in a particular range). The latter represents constraints that would be difficult to automatically evaluate based on the accountability trace, but instead would need to be evaluated by human agents (e.g., a constraint to ensure that the agent performing an action has relevant expertise). Such constraints were introduced to help manage the quality of accountability traces and to support the discovery of inconsistencies within the accountability KG.

D. INCORPORATING USER FEEDBACK IN THE DESIGN PROCESS
To assess how well our approach addresses the knowledge capture and modelling requirements identified above, as well as to understand the potential benefits of our semantic based approach, two expert workshops were organised to demonstrate and discuss the Accountability Fabric. This prototype implementation comprised of a suite of tools demonstrating the feasibility of a computational solution to support planning, collection and inspection of accountability information using SAO and RAInS [45], [46]. Workshop 1 focused on legal experts (5 in total), whereas Workshop 2 focused on CS researchers and AI experts (10 in total). Both groups highlighted similar advantages of the Accountability Fabric and the resulting ontology concepts, including the fact that such a standardized, structured approach would be of use to regulators and investigators who may be less familiar with the mechanics of AI systems but are seeking specific information to determine regulatory compliance and to build a framework of enforcement. It was also felt that our approach could help to overcome knowledge gaps, particularly amongst the legal profession, by introducing a common understanding of accountability in AI that will allow meaningful information to be captured. CS researchers and AI experts also indicated that such a system would be beneficial by bringing structure to the workflow for new entrants, as such an approach allows for meaningful communication with customers interested in deploying AI systems, again by providing a common, structured language. Participants felt that it could be helpful in troubleshooting where errors may have occurred in the event that an AI system behaves unexpectedly at the deployment stage, particularly given the attention to the life cycle and the potential for different actors to have been involved at different stages. The list of CQs was felt to be comprehensive, but participants suggested that there should be potential to extend the system if gaps were found. While participants agreed that there is some potential for the system to become burdensome, they also felt that the standardized, structured approach involving both automatically and human generated information is one that could be efficiently adopted and would remove some of the subjectivity present in existing accountability frameworks.

IV. COMPARISON WITH POPULAR NON-SEMANTIC FRAMEWORKS
In this section we assess the ability of our semantic ontology-based solution to describe accountability information, by comparing it with the popular non-semantic frameworks, Data Sheets, Model Cards, and FactSheets. For the remainder of this paper, we shall refer to these as baseline frameworks.

A. METHODOLOGY
The ability of our ontologies and the baseline frameworks to capture accountability information was evaluated against 96 individual CQs falling under the topics discussed in Section II. Information relating to Human decisions was excluded from this comparison, as such information was beyond the original scope of the baseline frameworks and will be discussed separately in Section IV-C.
To decide whether the baseline frameworks are able to capture accountability information required to support individual CQs, we first examined the published papers describing these frameworks ( [38]- [40]), including their appendices. We then examined the papers' supplementary materials, which consisted of two examples on the Model Card website, 8 examples provided with the Model Card Toolkit, 9 and seven examples listed on the FactSheets website. 10 To decide whether our ontologies are able to capture the information necessary to answer the CQs, we determined whether relevant classes and relationships exist that provide the means to generate part of a corresponding KG (see the Appendix for how our ontologies' classes and relationships capture the required information).
When analyzing the baseline frameworks, we only considered aspects relating to the design and implementation stages of the AI system life cycle -i.e., the current scope of our ontology. For example, information provided to the FactSheet question: ''When was the service first released? When was the last release?'' relates to the system deployment stage, and the questions ''When were the models last updated? How much did the performance change with each update? How often are the models retrained or updated?'' relate to the operations stage.
It should be noted that the evaluation of the baseline frameworks' ability to answer individual CQs was based on the understanding of their functionalities by the two authors of this paper who conducted the comparison. This understanding was gained from the associated papers and existing examples as it was not the frameworks' goal to provide semantic realizations. This poses a potential risk in terms of misinterpretation of the materials by the authors and, therefore, we note this as a potential limitation of our approach.

B. RESULTS OF COMPARISONS BETWEEN OUR APPROACH AND THE BASELINE FRAMEWORKS
Tables 1 -7 report the results of our evaluation. The notation used in each of the tables to indicate whether the frameworks are able to answer the CQs is as follows: 'Y' indicates that the framework does capture the information necessary to answer the CQ ; 'S' indicates that the framework captures some of the information needed to answer the CQ ; and 'N' indicates that the framework does not capture the necessary information. As previously mentioned, a detailed overview of the semantic concepts introduced in SAO and RAInS is included in the Appendix.
We now discuss the commonalities between the three baseline frameworks and our approach. As shown in Tables 1, 4, 5, 6, and 7 both our approach and FactSheets capture information about the whole system, supporting infrastructure, evaluation of the whole system, certifications, and operational guidelines. Datasheets and Model Cards do not, as they are fundamentally concerned with capturing information about datasets and ML models respectively. Also as expected, Table 2 shows that while Model Cards do capture some information about datasets, the Datasheets framework captures much more, with FactSheets somewhere between the two in terms of coverage. Finally, Table 3 shows that both FactSheets and Model Cards capture a considerable amount of information about ML models and their evaluation.

C. ACCOUNTABILITY INFORMATION COVERAGE BEYOND THE SCOPE OF THE BASELINE FRAMEWORKS
We now summarize how our approach captures accountability information beyond the scope of the baseline frameworks. First, because SAO and RAInS extend PROV, they can model the dates and time frames when the accountable actions were performed and when the accountable results were produced. For example, when the ML model was designed, realized, and evaluated, and also when it was approved. Second, RAInS can model the compliance requirements, i.e., the hard and soft laws, that individual AI system components comply with. This is in addition to modelling the compliance requirements applying to the system as a whole, which none of the baseline frameworks model. Third, RAInS models the different human decisions and approvals (corresponding to item 8 in Section II-B) that are integral to the life cycle of an AI system. This includes: who confirmed the fitness of each of the components to the system's specified purpose; who checked the compliance of each of the components with the system's compliance requirements; and who approved design specifications, system component implementations and their evaluations, and generated guidelines. Fourth, RAInS models the agents within the organizations who were accountable for the different actions. This is important for internal and external auditing. Finally, RAInS models whether the realizations of system components or their evaluation followed the design specification, as well as the justification for any deviations from the design specification. Typically, the accountable agent performing the realization has access to the design specification and is expected to document how it was performed. However, if for some reason, the agent decides to deviate from the design specification, then they must document their actions along with the change log and their justification for the change. This is crucial in associating accountability with the correct human agent. For example, if an agent, Bob, followed a faulty design produced by another agent, Alice, then the accountability falls on Alice or possibly on both Alice and Bob. If, however, Bob deviated from a design specification, which contained no errors, resulting in an AI system that produced unacceptable errors, then they are solely accountable.

A. BENEFITS OF SEMANTIC REPRESENTATION OF ACCOUNTABILITY INFORMATION
KGs model accountability information in a structured interoperable format that is understandable by both humans and machines. While in our approach a portion of the accountability information is stored in human-readable formats (e.g., text describing information elements, or images depicting evaluation results), many aspects of the accountability KGs are described using semantic concepts following the three core PROV concepts, namely: agents, entities and activities. This approach to modelling data makes it easier to integrate additional accountability information into the KG. For example,  the descriptions of agents may be expanded using the FOAF 11 vocabulary to list their contact information, which can then be retrieved when querying the accountability KG. Furthermore, accountability plans provide the means to guide and assess the collection of accountability trace information that goes beyond simple template documents, where users are only 11 http://xmlns.com/foaf/spec/ guided by different questions split over a number of sections as is the case with Datasheets and FactSheets. We do however acknowledge that the SMACTR framework, 12 which builds on Datasheets and Model Cards, does provide document checklists that guide users but split over different sections. As we demonstrate in [45], [46], it is possible to use   5. Information about the evaluation of the whole system (corresponds to item 5 in Section II-B).

TABLE 6.
Information about the certifications (corresponds to item 6 in Section II-B). accountability plans to build intelligent user interfaces that manage the gathering of accountability information through manual and also semi-automatic processes (e.g., by integrating with programming applications used to implement components of the AI system). Plan constraints can be used to further enrich this process by providing real-time warnings to users about missing accountability information (in the form of information elements) so as to avoid liability by omission. Constraints also allow checking for potential quality issues in the supplied information (e.g., such as when an agent claims to have evaluated an ML model before this model was implemented, or supplies values that fall outside predefined ranges). We envision that this planning mechanism could be used by regulators and standardization agencies in future to provide generic plan templates to accompany their AI regulations or standards. Such generic plans would then be employed by the respective organizations involved in AI system development to guide the gathering of accountability information. Accountability traces would be provided by a collection of compatible tools capable of understanding relevant portions of these plans.

B. INTEGRATION WITH EXISTING FRAMEWORKS
The flexibility of our semantic-based approach does not necessarily mean that users of existing frameworks, such as those discussed in our comparison evaluation, must migrate to an entirely new solution. Indeed, even some of the baseline frameworks such as Model Cards and Datasheets have been designed with an intention to complement each other [30], [39]. While collecting accountability information in a textbased format (such as a Datasheet or FactSheet) would not be supported by our approach, interactive user interfaces VOLUME 10, 2022  could be created that resemble the design intentions of these existing frameworks. We demonstrate this in [46] where we first integrate the data generated using the Model Card Toolkit into our accountability KG and later provide an option to display relevant parts of the KG in the form of a Model Card. Here, the semantic layer supported by our ontologies serves as middleware enabling the collection of information from heterogeneous sources (e.g., via manual input or through semi-automated frameworks such as the Model Card Toolkit) and provides a means for the information to be queried and displayed via a range of user interfaces.

C. THE NEED FOR SMARTER INFORMATION COLLECTION AND AUDIT MECHANISMS
There is no doubt that gathering good quality accountability information requires non-trivial effort due to the number of 74394 VOLUME 10, 2022      stakeholders involved in the AI system life cycle, and it is further complicated by various (sometimes incompatible) organizational structures and procedures including those preventing the disclosure of sensitive information. While we do not provide a solution to these organizational challenges, our approach does remove some of the technological barriers associated with gathering accountability information. For example, Hind et al. [54] identified several barriers relating to the adoption of FactSheets during evaluations with potential users of this framework. These included the need to reduce the time and effort during data collection; the need to address the difference between facts that can be captured automatically and those requiring ''human authoring'' which sometimes leads to incomplete and poor-quality results; the need to enable data capture at the time of creation of system components rather than having to recall it during later stages; the need for an open collection API that would support an unbounded set of tools that AI developers use to build their systems; and also the ability for the information captured by such systems to meet the different perspectives of stakeholders (e.g., some users would benefit from simplified views).
As we have outlined previously, our semantic approach overcomes all of these barriers as it offers a standard and domain-extendable vocabulary that can be used by external tools to record the collected accountability information as parts of the accountability KG, which can then be submitted to a central semantic middleware layer. In addition, such tools can provide instant feedback to the users who generate accountability traces (e.g., when a required piece of information is missing) based on the constraints defined in the accountability plans as demonstrated in [46]. The KG is also capable of supporting a range of visualizations of the accountability information to users; this is enabled by the ability to perform structured queries over the types of information, chronology of the events, and the identification of missing information (i.e., where a plan expects a piece of information to be provided, but it is absent from the accountability trace). Moreover, our accountability KG associates the information with individual human agents who can be held accountable for providing false or insufficient information or simply contacted for further clarification. This critical aspect is often limited to organization level (e.g., the company responsible for developing an AI system) in the baseline frameworks.

VI. CONCLUSION AND FUTURE WORK
We have presented an approach for recording accountability information relating to the design and implementation of AI systems using accountability knowledge graphs. We also presented a comparison between the capabilities of our approach and three popular baseline frameworks and have discussed how our approach offers opportunities to integrate and extend the information provided by these frameworks. We have also described how our approach overcomes some of the technological barriers associated with collecting accountability information.
In future work, we will focus on extending the semantic vocabularies to cover the deployment and operation stages of the AI system life cycle. We will explore further opportunities to demonstrate the integration of information provided by external tools, especially for the operation stage, where we expect the greatest benefit of automated accountability information capture. As new stakeholders will be captured in the accountability KG (e.g., subjects of the final AI decisions) we will build on our initial round table discussions to conduct a series of user-based evaluations to evaluate the practicality and usefulness of accountability KGs from both legal and technological perspectives.

IN MEMORIAM
Dr. Caitlin Doyle Cottrill (1977 -2022) passed away at the age of forty-four following a long battle with cancer. She will always be remembered for her kindness, support, gentle humour, and commitment to creating a better world for all people and for the environment in which we all live.

APPENDIX HOW THE ONTOLOGIES ANSWER THE COMPETENCY QUESTIONS
In this appendix we present the full list of competency questions (CQs) and illustrate how our approach models the corresponding accountability information. Table 8 lists the ontologies we use, along with their prefixes and namespaces. Figure 5 showcases how the entities, activities, and agents relate to each other, and where the concepts sao:AccountabilityPlan, sao:AccountableAction,     sao:AccountableResult, and sao:InformationElement are substituted by their appropriate RAInS subclasses. This is done as follows: A sao:System is created and attributed to an sao:Agent. An associated sao:AccountabilityPlan (AP) is created for the system. The created system would be of type rains:AI_System. Given the current scope of RAInS, the accountability plan would be either rains:DesignStageAccountabilityPlan or rains:Implementa-tionStageAccountabilityPlan.
A sao:AccountabilityPlan contains some sao:Accountab-leActions (AA), each having at least one accountable result output. The first output would be sao:AccountableResult (AR1), and if there is a second output, this is shown in Figure 5 as sao:AccountableResult (AR2). For example, when the accountable action rains:RealizeComponent is planned to realize a dataset, it would have as its outputs the two accountable results rains:DatasetComponent (AR1) and rains:ChangeLog (AR2). Where applicable, some sao:AccountableResults can be used as inputs to subsequent sao:AccountableActions. This is indicated in Figure 5 Figure 5, when the accountable actions and accountable results are created in the plan, accountability information to be recorded in the accountability trace is indicated in a human-readable plain text format using the data property rdfs:comment. Also not shown in the figure, are the constraints to be satisfied in the accountability trace which would be of type rains:AutoConstraint or rains:HumanConstraint.
At the accountability trace level, an ep-plan:Activity is created to correspond to each sao:AccountableAction. This activity is associated with an sao:AccountableAgent (AG1) and generates one sao:InformationRealization for each planned sao:AccountableResult output. Thus, the sao:AccountableAgent (AG1) which is associated with the ep-plan:Activity is also accountable for the generated sao:InformationRealization(s). Note that any accountable agent may act on behalf of another person or organization, resulting in them being accountable as well. Accountability information about the sao:InformationRealization is recorded in plain text format using the data property rdfs:comment. These comments capture the human  readable accountability information, and where appropriate (e.g., with images) also as base64 strings. Each sao:InformationRealization is a collection of sao:Information Elements (IE1, IE2, . . .). Some sao:InformationElements may be attributed to a sao:AccountableAgent (AG2) different from the one who performed the ep-plan:Activity and thus is accountable for this sao:InformationElement. For example, if the agent who produces a design specification for a dataset chooses a third-party off-the-shelf dataset, then the creator of this third-party dataset is accountable for only the dataset, but the first agent who chose to use that third-party dataset is still accountable for their specified design and their decision to include the dataset. The sao:AccountableAgent (AG2) is included as a subtype sao:InformationElement in the sao:InformationRealization collection. Tables 9 to 16 show how the identified CQs can be answered using concepts from SAO and RAInS, as well as those imported from the ontologies listed in Table 8. In order to visualize how the concepts listed in the righthand columns of Tables 9 to 18 relate to each other, the reader should take the concepts listed in the tables and populate the template in Figure 5. Note that not all sao:InformationRealizations will have sao:InformationElement members. Also, where a reference to a sao:InformationRealization corresponding to the same accountable result is repeated, it is always the same InformationRealization. For example, the repeated mentions of sao:InformationRealization corresponding to rains:DatasetComponent (AR1) all refer to the same Information Realization.
Note that properties are either object properties (op) or data properties (dp). Also note that the inputs we list in Tables 9 to 18 for accountable actions are what we consider to be the minimum inputs, and the plan creator may indicate additional required inputs as they see fit for their application domain and specific AI system. Moreover, some CQs were generalized and have been included in both the design and implementation stages. In the tables, we show how the modelling is done at the design stage using rains:DesignStageAccountablityPlan (AP) and at the implementation stage using rains:ImplementationStageAccounta-bilityPlan (AP). Where a CQ is to be answered for both stages, both sets of concepts are included and are separated in the tables using a dashed line. Finally, recall that, because the PROV ontology can intrinsically model the dates and times that activities are performed -using the properties prov:startedAtTime and prov:EndedAtTime, and when entities are generated -using the property prov:wasGeneratedAtTime, we did not create CQs about when the different accountable actions are performed or when accountable results are generated.
IMAN NAJA is a Research Fellow in computing science with the University of Aberdeen. Her research interests include accountability, audit, trustworthiness, and transparency of AI systems, and her work has been in applying Semantic Web technologies and provenance models to address those topics.
MILAN MARKOVIC is a Research Fellow in computing science with the University of Aberdeen. His research interests include enhancing transparency and accountability in complex socio-technical systems through intelligent processes based on provenance data models and Semantic Web technologies.
PETER EDWARDS is a Professor of computing science with the University of Aberdeen. His research interests include provenance, trust, and information quality, with a particular emphasis on the role of Semantic Web technologies in helping realize transparent and accountable systems.
WEI PANG is an Associate Professor in computer science with Heriot-Watt University. His research interests include machine learning, bio-inspired computing, and accountable AI. He has authored over 130 papers in the above fields.
CAITLIN COTTRILL was born in December 1977. She was a Senior Lecturer and the Director of the Centre for Transport Research with the School of Engineering, University of Aberdeen. Her research explored the use of technology in urban spaces, with particular focus on transport and data privacy, and her work had a strong focus in the support of public and active transport modes, especially for vulnerable and under-served users.
REBECCA WILLIAMS received the undergraduate degree from Worcester College, Oxford, and the Ph.D. degree from the University of Birmingham. She is a Professor of public law and criminal law, in association with the Pembroke College. She was previously a fellow of the Robinson College, Cambridge. Her principal teaching interests include criminal law and public law. Her research interests include law and computer science, criminal law, public law, and the interrelationship between public law and unjust enrichment. VOLUME 10, 2022