Toward Trustworthy Artificial Intelligence: An Integrated Framework Approach Mitigating Threats | IEEE Journals & Magazine | IEEE Xplore

Toward Trustworthy Artificial Intelligence: An Integrated Framework Approach Mitigating Threats


Abstract:

We present the Strategic Artificial Intelligence (AI) Threats Navigation matrix, an integrated approach to navigating AI threats and trustworthiness throughout the AI lif...Show More

Abstract:

We present the Strategic Artificial Intelligence (AI) Threats Navigation matrix, an integrated approach to navigating AI threats and trustworthiness throughout the AI lifecycle. It aims to assist developers, data scientists, regulators, and users in addressing threats and bolstering AI’s reliability, encouraging an improvement of trustworthiness.
Published in: Computer ( Volume: 57, Issue: 9, September 2024)
Page(s): 57 - 67
Date of Publication: 30 August 2024

ISSN Information:

Funding Agency:


The rise of artificial intelligence (AI) has significantly boosted productivity and transformed daily life. By automating routine and monotonous tasks, it frees up employees to engage in more complex and creative endeavors. AI’s prowess in swiftly processing and analyzing vast amounts of data enables it to uncover patterns and insights that might elude human analysis, enhancing decision making in sectors, such as finance, marketing, and health care. Furthermore, AI is a catalyst for the creation of novel and innovative products and services. It has improved both efficiency and accuracy across various fields, ranging from health care to business, streamlining operations and refining customer interactions.

AI, like many other technologies, offers both promise and peril. Understanding the inner workings of AI models and their outcomes is a challenge for most people, rendering AI akin to a black box. Even experts find it difficult to elucidate the effect of the numerous parameters in large models like large language models (LLMs) on the results they produce. Moreover, AI has been criticized for being an unjust and unsafe block box,1 primarily because its outcomes are often influenced more by the input data than by the model’s own predictive capabilities.2 Overreliance on automated systems, particularly when there is a lack of understanding of their internal workings, can result in incorrect decisions. This can have serious implications for safety, exemplified by incidents such as aircraft accidents.3 These incidents underscore the potential hazards AI poses to personal safety. Additionally, in the realm of generative AI, the lack of a “standard answer” or specific constraints on outputs means that these models can produce hallucinations, outputting information that doesn’t exist or results that are inconsistent with the user input.4

Even if an AI model operates correctly, it can still be misused (that is, not its original purpose). For instance, facial recognition models in security systems or license plate number recognition models for parking lot management might be employed for surveillance of citizens in authoritarian states. Techniques like deep fakes, whose originally intended use is to augment audio or visual training data, could be misused for scam calls or spreading false pornographic images.5

Responsible use of AI is a way out. Numerous international and governmental organizations have hence put forward ethics, frameworks, and guidelines for AI, which are increasingly being regulated by law. To the best of our knowledge, key entities, such as the IEEE,6 the Organization for Economic Cooperation and Development (OECD),7 the International Telecommunication Union (ITU),8 Microsoft Corporation,9 the United States and especially its National Institute of Standards and Technology (NIST),2 the United Kingdom,10 and the European Union,1 have already released relevant documents one after another. There are many other researcher’s published articles focusing on the threats of AI, which are the backdoor attack, predictable sensitive data, algorithmic or data bias, to name but a few. These documents aim to mitigate risks and minimize the negative impacts of AI, demonstrating a concerted global effort to guide the responsible development and deployment of AI technologies.

These frameworks and studies are useful, but they are either fragmented or use different terminologies. There is an increasing demand for a more unified and comprehensive approach to enhance confidence and trustworthiness. AI developers and businesses need to concentrate on both reducing and eradicating AI system risks, particularly during iterative development phases, and on pinpointing vulnerabilities as risks become reality.

In this article, we contribute the Strategic AI Threats Navigation (SAIT-Nav) matrix, which integrates the aforementioned ethics, frameworks, guidelines, and research. The SAIT-Nav matrix encompasses threats related to each aspect of AI trustworthiness and spans various stages of the AI lifecycle. It provides a standardized framework for describing and classifying threats at each stage of the AI lifecycle, aiding organizations in easily understanding and preventing a range of threats, and ultimately developing targeted defensive strategies. The matrix also facilitates stakeholders to evaluate of the effectiveness of security control measures, identifying potential vulnerabilities. By offering a common language for diverse stakeholders to share their knowledge and experiences, the SAIT-Nav matrix aims to significantly enhance the trustworthiness of AI.

However, some challenges still need attention in future research. First, we have not examined applications within various AI system types, nor provided detailed implementation guidance. Additionally, some organizations or startups might struggle due to scarce resources or insufficient technical knowhow. Discussing the compromises linked to these limitations is crucial for establishing best practices and addressing the most critical threats.

Components of Trustworthy AI

As stated above, various international and governmental organizations have provided their views on trustworthy AI. According to these documents, trustworthy AI shall mainly have the following features.

  • Of human rights and well-being: AI should be founded on fundamental rights, with its primary focus being the enhancement of individual well-being, encompassing quality of life, human autonomy, and freedom. The ethical principles of AI are designed to safeguard human interests. In terms of human and environmental well-being, AI acts as a catalyst for innovation, contributing to the achievement of the United Nations Sustainable Development Goals (SDGs). Trustworthy AI is thereby expected to drive prosperity, and creating value and wealth.1

  • Transparent, explainable, and accountable: Trustworthy AI hinges on accountability. Accountability in this context means adhering to standards or laws, ensuring AI makes logical and traceable decisions, and documenting processes and involved parties for auditing purposes.1,2,7 For AI systems to be accountable, providers should openly share relevant information. It is crucial for users to be aware that they are interacting with AI and to understand the rationale behind its decisions. This awareness enables users to logically challenge AI systems and take more appropriate actions.7 A higher level of understanding will further enhance confidence and trustworthiness in these technologies.2

  • Privacy preserving, secure, and safe: AI operates on vast datasets, making it relatively simple to extract, re-identify, link, infer, and act upon sensitive details concerning individuals’ identities, locations, habits, and preferences. Consequently, AI practitioners must safeguard individuals’ privacy rights through technical means, such as privacy-enhancing technologies.2 AI systems that uphold confidentiality, integrity, and availability (CIA trial) by implementing protection mechanisms to prevent unauthorized access and use can be considered secure, which is vital to safeguard AI systems from attacks, such as data poisoning, exfiltration of models, or other forms of intellectual property leakage.2,11 Insecure AI systems could even pose risks on human and environmental safety.1 To ensure the security and safety of AI systems, agencies should assess potential risks and harms and implement robust security and safety policies. This includes conducting traceable analyses and establishing recovery plans to address accidents effectively.1,2,7,9

  • Technically robust and reliable: Robustness and reliability in AI systems entail their ability to function accurately as expected (under given designed constraints) and required, without failure, regardless of the conditions of normal use, foreseeable use, or misuse. This includes scenarios where AI systems are used in ways not initially anticipated, ensuring consistent performance throughout their lifecycle.2,7 To address the uncertainties associated with AI, standardization plays a crucial role in ensuring reproducible and replicable outcomes.1,8 As environmental and societal conditions evolve over time, it is imperative for AI practitioners to continuously test, monitor, and evaluate AI systems to ensure they are operating as intended.2,9

  • Fair: Fairness in AI encompasses promoting equality and equity while working to eliminate discrimination and harmful biases. The concept of “fairness” can vary across different cultures and application contexts, requiring organizations to acknowledge and mitigate associated risks. Biases, whether intentional or not, can stem from systemic, computational, statistical, and human-cognitive sources. These biases may become embedded in AI systems and proliferate rapidly in both scope and scale.2,9 It is recommended that AI system providers shoulder the responsibility of conducting through due diligence on fairness and establish oversight processes, involving actors from various backgrounds, cultures, and disciplines to ensure an inclusive approach to fairness.1,7

  • Human supervisable: Human involvement, including oversight and control by regulators and sociologists, is essential to support trustworthy AI to make sure of “values-alignment” in AI systems. Given AI’s potential for diverse future applications, there is a risk of improper design, misuse, or unforeseen incidents. To increase confidence level, legally regulating and auditing AI systems to guarantee they foster human well-being and rights as expected is also indispensable. In instances of accidents, such as breaches of information security or other value violations (for example, discrimination), involvers of AI systems’ design, development, and operation must be traced and held accountable. This accountability ensures that all personnel engaged in the AI system’s every life stage take responsibility for its decisions.1,2,7,9

The context of these principles and frameworks vary among concrete backgrounds and goals of various entities. For instance, the focus of the European Union AI Act is to ensure and improve the safety of AI, while for private corporations, such as Google and Microsoft, they are more concentrative on integrating these principles into their products. However, their shared objective is to provide responsible, lawful, and ethical development and applications of AI systems. The comparison among the points of view of diverse entities is shown in Table 1.

Table 1. Various views on trustworthy AI.
Table 1.- Various views on trustworthy AI.

Threats of AI

Although these documents indicate a path toward trustworthiness, there are still a number of threats of AI. The reasons are as follows:

  • Technical complexity: AI is inherently complicated, particularly for deep neuron networks and LLMs. This complexity leads to challenges for even experts in fully understanding some models, making it difficult to detect and correct biases, errors, or unethical behaviors.1

  • Rapid development of technologies: AI evolves at a breakneck pace, especially recently. Current documents struggle to keep up with this rapid advancement, while new technologies continually introduce novel risks and threats.12

  • Challenges in supply chain and lifecycle: Building AI systems involves a long supply chain, including data collection and filtering, model designing and development, system integration, testing and validation, deployment, monitoring, and maintenance. Every step in such a complex supply chain could cause vulnerabilities.10

  • Diverse application scenes: AI’s application spans a wide array of sectors, including medicine, finance, entertainment, agriculture, industry, and transportation. This diversity in application leads to varied requirements, complicating the creation of a single, unified framework to address every scenario.13

  • Human factors: The design, development, and use of AI are significantly influenced by humans’ decision. Biases and prejudices of developers and managers, lack of ethical considerations, insufficient validation, actors’ mistakes, and adverse actions could all lead to misbehaviors of AI systems.

  • Regulatory challenges: Effective regulation requires clear standards and guidelines, along with strong enforcement. Authorities may lack the necessary expertise, which is rapidly evolving, as well as the resources and authority to enforce regulations. Differences in regulations between countries and ethical standards across cultures make applying a unified standard challenging and complex. Moreover, AI system providers are often multinational companies, further complicating regulation.3

The threats of AI diverse. As an information system intrinsically, AI not only succeeds threats of general information systems, but still holds its own particularly unique threats. Figure 1 shows the schematic.

Figure 1. - The architecture of the SAIT-Nav matrix. The green indicators on the chart are less risky or general threats, while the darker, red indicators are more hazardous and AI-related.
Figure 1.

The architecture of the SAIT-Nav matrix. The green indicators on the chart are less risky or general threats, while the darker, red indicators are more hazardous and AI-related.

We have broadly investigated the threats of AI systems with respect to the NIST AI Risk Management Framework (RMF), one of the widest accepted official documents. Open Web Application Security Project (OWASP) has also listed the top 10 threats of machine learning (ML) and LLM, respectively.14,15 Additionally, we have delved into survey articles, employing both forward and backward searching techniques and inductive reasoning. While many commonly encountered threats are sourced from OWASP and academic surveys, it is possible to identify newly discovered threats that could be incorporated into the SAIT-Nav matrix.

Common threats consist of:

  • Manipulation: This attack involves users deliberately altering input data, parameters, or system outputs to deceive ML models, impacting user judgment. These attacks prompt computational errors in the model through minor, strategic modifications, including adversarial attacks. For instance, a doctor using a manipulated system could prescribe incorrect treatments.11,14,15 This threat also pertains to risks associated with general information systems.

  • Data poisoning: This threat occurs when an attacker “poisons” the training data, such as by altering labels. It causes models to malfunction in specific patterns. This can happen when attackers exploit vulnerabilities in AI servers, representing a risk common to general information systems.11,14,15 This category includes model skewing attacks related to feedback loops.

  • Model inversion and stealing: Attackers may conduct reverse engineering on models in attempt to extract some information, for instance, personal identifying information, posing significant privacy, security, and safety risks.14,15

  • Prediction of sensitive information: Attackers could query sensitive information from models, such as emotions, locations, and financial history.14,15

  • Supply chain attacks: In these attacks, an aggressor alters the libraries or models on which an AI system depends.14,15 This can result in data breaches, model malfunctions, or the creation of backdoors.

  • Prompt injection: With an LLM, crafty manipulation of inputs can lead to unexpected outputs.15 Examples are that attackers would utilize an LLM to obtain users’ passwords, or even to plan a biochemical attack.

  • Hallucination: AI models sometimes generate nonexistent or unreal data. These virtual or fabricated outputs, though seemingly authentic, can mislead users and have implications for safety, security, and compliance, depending on the application scenes.4

  • Black box problem: AI models, particularly LLMs, are often large and complex, making them difficult for users and experts to comprehend and interpret.1 This opacity presents challenges in accountability and tracing responsibility in the event of accidents.16

  • Inadequate explanation: Stemming from the black box problem, this issue arises when the inner workings of AI systems are not clearly understood. This lack of understanding makes it challenging to address fairness, privacy, safety, and security aspects of AI systems.16

  • Tradeoff among values: Implementing all aspects of trustworthiness could raise tensions and contradictions among them. While the complexity of ML models can hinder their explainability, for example, it is not always true that more complex models perform better. In fact, simpler models often offer greater explainability. This tradeoff becomes more pronounced when developers work with limited data. Consequently, stakeholders must balance among values of trustworthy AI.1

  • Misinterpretation: There is a risk of users misconstruing explanations provided by AI systems.1 Such misunderstandings can lead to poor decision making, potentially resulting in significant negative consequences.

  • Evasion attacks: Adversaries may bypass the detections to attack the systems during their operation phase.17 This type of attack is akin to data poisoning, with the key difference being that evasion attacks occur at test time, whereas data poisoning targets the training data.

  • Model drift: In a continuous training–feedback cycle, the performance of models can vary in accordance to the environment shifts and new training data.2 Constant monitoring and evaluation of the models are crucial to maintain their effectiveness.

  • Overfitting and underfitting: Overfitting and underfitting are prevalent issues in ML. Such issues would influence the performance and reliability of AI systems.18

  • Data-related issues: These threats encompass data quality, persistence, repurposing, spillovers, and exploitation. The quality of a big data on which AI systems rely significantly affects their performance. Concerning privacy, data persistence and repurposing can pose risks. The extended duration of data storage, often beyond what users anticipate or consent to, can lead to legal challenges. Data spillovers refer to privacy issues arising when data is collected from unintended individuals. Data exploitation, or the illegal use of personal identifying information, presents privacy threats.18,19

  • Misuse: There have been instances of AI being misused for fraudulent activities, creating harmful fake news and images, and even developing weapons.5 It is crucial for AI system providers to maintain vigilant oversight against such hostile uses.

  • Overreliance: In human-in-the-loop processes, overreliance on AI can lead to significant harm. Humans often play a critical role in such processes to provide feedback and thereby improve the performance and avoid errors.2 Overreliance undermines this human role, potentially compromising safety or fairness.

  • Lack of human oversight: Aligning AI with human values can be challenging. Human oversight and regulation are essential in this context.2 With humans absent, there are no breakers against misuse or hallucination, leading to security and safety risks and hindering accountability and transparency.

  • Automation risks: While automation enhances efficiency, malfunctions in automated systems can pose safety threats, such as aircraft crashes.3

  • Lack of clarity: The practices, measures, and risks of AI systems, or the roles, responsibilities, and values of stakeholders and vendors could be not concrete enough.2 This ambiguity can make it challenging for organizations to adhere to accountability regulations or not to pose safety incidents.

  • Vulnerability to attacks: Increased transparency in AI systems can paradoxically heighten risks. As AI becomes more transparent, attackers gain access to more information, potentially making the systems more susceptible to attacks. Furthermore, the disclosure of system details might expose providers to legal action or stricter regulations.20

  • Reidentification and deanonymization: Adversaries can use AI to identify and track individuals across various environments, including their homes, workplaces, and public areas. For instance, facial recognition modules in security systems could be exploited for tracking and identifying targeted individuals, raising safety and privacy concerns.19

  • Bias: AI models may perpetuate biases present in their training data or inherent in their algorithms. Additionally, societal and institutional biases can infiltrate the collected data. Even if these biases could happen without the actors’ intent, it is essential for stakeholders to be vigilant about such biases, implementing rigorous oversight and control processes.2

The SAIT-Nav Matrix

These threats will not be realized concurrently. Like risks in many systems, the threats of AI, which keep people from trusting AI, may happen at every stage of its thorough lifecycle. The U.K. National Cyber Security Center, the U.S. Cybersecurity and Infrastructure Security Agency, and other 21 governmental entities from 17 countries, have issued Guidelines for Secure AI System Development, in which the precautions of each stages of the AI development lifecycle are introduced.10

These stages are defined as:

  • Design: This phase encompasses the identification of relevant risks and threats, and how to balance among threats mitigation with the purposes and performance of systems. To provide a secure AI systems, system owners and senior leaders shall also elevate the awareness about security of the staff, and foster a culture of security.

  • Development: This stage involves ensuring supply chain and assets security, maintaining thorough development documentation, and managing technical debt. The developers are asked to use verified and well-documented data, models, libraries, and other software and hardware components. They also have to identify and protect valuable assets of their systems, make detailed development documents for audit, and stay updated with best practices.

  • Deployment: At this stage, organizations shall focus on securing their infrastructure, particularly ensuring their models continuously, against compromise. They are also responsible for establishing incident response and recovery plans. Last but not least, organizations shall release well-evaluated, secure AI systems, configure secure default parameters, and guard against adversarial user behaviors.

  • Operation and maintenance: Stakeholders have to keep ongoing logging and monitoring the behaviors and input of their systems in the postdeployment stage. It is critical for accountability to document systems’ behavioral shift, especially in the case of compromise or misuse. Agencies shall also implement an automated secure update process, and record the changed parameters in the update. They shall also share and accumulate experiences to mitigate and remediate evolving risks.

This document presents best practices for AI system development, offering advice on building secure AI. Yet, it falls short of discussing the risks associated with not adhering to these practices. This gap could prevent organizations from efficiently handling incidents or breaches, possibly resulting in repeated errors. Additionally, the absence of a shared vocabulary may obstruct the exchange of experiences across the industry.

To address these gaps, we contribute the SAIT-Nav matrix. Inspired by the MITRE ATT&CK (Adversarial Tactics, Techniques, and Common Knowledge) framework, the SAIT-Nav matrix aims to establish a knowledge base detailing adversarial tactics and techniques in AI system development.

The SAIT-Nav matrix offers insights into potential threats at every phase of an AI system’s lifecycle and across all aspects of AI trustworthiness. In addition to such guidelines, we have incorporated considerations from the AI RMF released by NIST. Unlike other documents that focus on high-level ethics or values, the AI RMF concentrates on the development process of AI systems, offering a clearer method to categorize potential risks in assessing trustworthiness.

The SAIT-Nav matrix is demonstrated in Table 2. It serves as a standardized framework for stakeholders. It aids in identifying and addressing AI system risks and evaluating the efficacy of security controls. The matrix also assists in identifying potential vulnerabilities, enabling organizations to prepare resilience and recovery strategies. By sharing defensive tactics against emerging threats and enhancing sector-wide security, the SAIT-Nav matrix ultimately contributes to raising the overall security level of various sectors.

Table 2. The SAIT-Nav matrix.
Table 2.- The SAIT-Nav matrix.

Evaluation

As we discover in the SAIT-Nav matrix, each aspect of trustworthy AI affects the systems at different stages of its lifecycle. Table 2 shows that AI systems providers shall focus on aspects of explainability, privacy, and fairness during the design stage. This is because trustworthiness in these three aspects is almost impossible to remediate in later stages and should be integrated into the model’s design. At this stage, sociologists, legal scholars, and regulatory authorities should participate with AI providers to achieve nondiscriminatory, privacy-preserving AI.1,7,9 This stage also involves considering how to present interpretations of model outputs for users to understand.

During the development phase, main concerns are privacy and security. These aspects involve both training data and the supply chain. Organizations should carefully assess whether the sources of training data are legal, whether they could infer sensitive data, and whether they have been poisoned.2 Meanwhile, the security of the supply chain should be carefully evaluated, ensuring there is comprehensive support documentation, so organizations can securely develop AI systems. This helps avoid leaving hidden vulnerabilities to be exploited. In the deployment phase, entities should protect their infrastructure to prevent unsafe configurations.10

Last, security and safety are of utmost importance in the maintenance phase. AI system providers should continually track whether their products may cause concerns regarding security and safety, or if there are any changes in system performance; some small changes could be precursors to an attack.

The SAIT-Nav matrix can navigate stakeholders in a trustworthy direction. AI system providers can use the SAIT-Nav matrix to correspond threats with relevant lifecycle stages and quickly develop targeted strategies. Conversely, they can pay special attention to certain aspects of trustworthiness at each lifecycle stage.

Here are two simple cases for guidance on applying the SAIT-Nav matrix.

  1. An attacker applies indirect prompt injection to utilize vulnerable plugins without input validation and poor access control, seeking poisoning data and invoking hallucination.

  2. An attacker asks AI to reveal content of intellectual property, such as validation numbers of software license (say, the numbers are from legal sources).

The SAIT-Nav matrix can enhance stakeholders’ efficiency in vulnerability discovery and threat detection capabilities, and reduce the response time. In case 1, stakeholders shall first indicate prompt injection as the reason for the hallucination. According to the SAIT-Nav matrix, stakeholders shall enforce privilege control or segregate external content in their system, instead of trying to solve the hallucination problem during development. In case 2, it is obviously a misusing situation. Stakeholders are not expected to clean training data, but to add a censor layer to filter problematic output.

During system development, moreover, the SAIT-Nav matrix can also make employees aware of threats and foster a good organizational culture, which ultimately improves the industry’s security performance.

AI provides promise and peril. While AI significantly enhances efficiency in analyzing big data trends and improving applications, such as smart factories and smart cities, it also raises concerns about its trustworthiness. Past research has identified numerous threats but often lacked a comprehensive and holistic perspective. We contribute the SAIT-Nav matrix, integrating various aspects of AI trustworthiness and its lifecycle. The SAIT-Nav matrix enables stakeholders to investigate and mitigate threats, thereby enhancing the trustworthiness of AI systems, consolidating insights about potential risks, and ultimately elevating security standards across sectors. Among the myriad threats identified throughout the complete lifecycle, the absence of human oversight emerges as particularly critical. We also recognize the fundamental importance of each aspect of trustworthiness, lifting the security level of sectors.

ACKNOWLEDGMENT

This research is sponsored in part by the National Science and Technology Council, Taiwan, with the project NSTC 112-2634-F-002-002-MBK, and in part by the Chunghwa Telecom with the project “A general speech enhancement model for cross-domain robust automatic speech recognition and knowledge graph question answering system” (TL-112-Q101). Tsungnan Lin is the corresponding author.

References

References is not available for this document.