Building Trustworthy AI Solutions: A Case for Practical Solutions for Small Businesses

Building trustworthy artificial intelligence (AI) solutions, whether in academia or industry, must take into consideration a number of dimensions including legal, social, ethical, public opinion, and environmental aspects. A plethora of guidelines, principles, and toolkits have been published globally, but have seen limited grassroots implementation, especially among small- and medium-sized enterprises (SMEs), mainly due to the lack of knowledge, skills, and resources. In this article, we report on qualitative SME consultations over two events to establish their understanding of both data and AI ethical principles and to identify the key barriers SMEs face in their adoption of ethical AI approaches. We then use independent experts to review and code 77 published toolkits designed to build and support ethical and responsible AI practices, based on 33 evaluation criteria. The toolkits were evaluated considering their scope to address the identified SME barriers to adoption, human-centric AI principles, AI life cycle stages, and key themes around responsible AI and practical usability. Toolkits were ranked on the basis of criteria coverage and expert intercoder agreement. Results show that there is not a one-size-fits-all toolkit that addresses all criteria suitable for SMEs. Our findings show few exemplars of practical application, little guidance on how to use/apply the toolkits, and very low uptake by SMEs. Our analysis provides a mechanism for SMEs to select their own toolkits based on their current capacity, resources, and ethical awareness levels – focusing initially at the conceptualization stage of the AI life cycle and then extending throughout.


I. INTRODUCTION
T HE ethical, social, and legal landscape of artificial intelligence (AI) driven systems is rapidly changing. Since the General Data Protection Regulation 2018 [1], stakeholders developing AI systems have faced numerous challenges in the interpretation and implementation of Article 22, specifically concerning an individual's rights in the context of automated decision-making, the ability to explain AI decisions, explanation of the logic involved, and the development of models using only "correct" data. This has caused major challenges because of the lack of legal guidance, case law, and ethical principles about the use of AI in different contexts. For small-and medium-sized enterprises (SMEs), these challenges are even greater due to a lack of specific skills, budget, and human resource. The international policy and impact landscape of AI is still fragmented in approaches to regulation, frameworks, guidelines, and standards (i.e., P7000), with numerous ethical principles being circulated which all convey broadly similar messages [2]- [15].
These "guidelines" often focus on the AI technology or service rather than organizational processes and human behaviors, providing little to no mechanisms for accountability and compliance (audit), and ignore the benefits of coproduction and public scrutiny [16]. From an SME perspective, practical implementation is difficult if not impossible. There has been significant "bad press" around poor design, poor rationale, and unethical applications of AI, which has fueled public mistrust. Pownall [17] provides an excellent, regularly updated repository of news stories that challenge whether the use of AI is ethical, for example, the use of face tracking tablets which profile customers and deliver relevant advertisements in UBER. As the public gains knowledge and understanding of issues around the use and application of AI (including bias, fairness, accountability, responsibility, etc.) coupled with an increased awareness of data privacy, both public services and the private sector will have to become more accountable if they win public trust and secure the vital public "license to operate." Reputational damage as a result of insufficient or ineffective data and AI governance can cause significant harm to a business, with greater impact on SMEs [17]. There is still a significant gap between top-down theory and practical adoption of robust ethical practices across the entire AI value chain [15], [18], [19], but our research suggests that this is more prevalent in SMEs.
In this article, we adopt the European Commission's definition of an SME which is an enterprise with fewer than 250 employees, a turnover below €50 million or a balanced sheet total below €43 million [20]. A small business has fewer than 50 employees and a micro business fewer than ten employees [20]. Global business will have different definitions on the size of SMEs, for example, in USA, an SME may have up to 500 employees dependent on the sector [21]. The World Bank states that globally, SMEs represent 90% of businesses and account for over 50% of employment, and in emerging markets, seven out of ten jobs are created by SMEs [22]. In many countries, SMEs are able to access competitive public funding to support growth acceleration and drive innovation in the AI space, but to date there has been little to no focus on responsible innovation. These programs have generally ignored the need for strong AI and data governance, and not provided training and upskilling in the domains. Fortunately, over the past few years numerous organizations and academics have published "ethical toolkits" to help organizations adopt and embed processes and practices that mitigate risks and "do AI ethically." These toolkits help organizations ensure their innovative systems adhere to the key pillars of "ethical tech" around beneficence, nonmaleficence, autonomy, justice, and explicability [19].
The overall aim of this article is to evaluate the thematic and AI life cycle coverage of these toolkits. We also assess the usability of the toolkits from an SME perspective and identify which toolkits are least onerous to adopt and address the barriers to adoption highlighted by SMEs. By categorizing the toolkits against ethical AI themes and adoption/usability, we provide organizations of all sizes, but especially SMEs, with an easy way to identify the most suitable tools, methods, and processes to implement. Our study is divided into two parts. First, we conducted qualitative SME consultations over two events to establish their understanding of both data and AI ethical principles and to identify the key barriers SMEs face in their adoption. As the collaboration between business and universities is a highly important mechanism for R&D activities and for stimulating innovation, it is important that academics make the good ethical research practices from within their institutions integral to contract research and knowledge exchange activities. Second, we conducted a review of available toolkits (published in academic, organizational, government, and gray literature) that support ethical and responsible AI practices. We evaluated these toolkits using criteria partly informed by our SME consultations across four aspects of ethical AI: 1) human-centric ethical principles; 2) applicability across the AI development life cycle; 3) barriers to adoption; and 4) key ethics themes covered.
In this article, we define a toolkit as a document or resource including guidelines (provided the described methods, techniques, or instructions for implementation), checklists, methodologies, activities, processes, frameworks, workflows, or approaches where the content focus is on responsible or ethical data (data ethics) or AI (ethical/responsible/trustworthy/trusted AI). We expand the definition of toolkit defined by Morley et al. [23] which focuses only on technical toolkits designed for data scientists and developers up to 2018.
This research aims to address the following research questions.
1) What are the barriers to ethical AI adoption by SMEs? 2) What is the current state of the market in practical toolkits for embedding AI ethical frameworks and governance into an SME culture? The main contributions of this article are as follows. 1) An analysis of the viewpoints of SMEs on ethical data and AI practices established through two engagement events which are useful to those organizations which are developing toolkits. 2) Identification of barriers to adoption of ethical principles, practices, and toolkits for SMEs. 3) A review and evaluation of recent toolkits against four groups of criteria (common ethical principles, stages of the AI product life cycle, responsible AI aspects and practical application aspects) designed to facilitate practical application of data and AI ethical practices. 4) An easy-to-use lookup table of ranked toolkits based on expert intercoder agreements of criteria coverage -suitable for SMEs to use. 5) Recommendations to the research community on the role of data and AI ethics in business knowledge exchange. The rest of this article is organized as follows. Section II presents a summary of the core risk factors associated with AI and an overview of the latest legal frameworks and current ethical guidelines and principles. In Section III, we present our two-part methodology; first, describing two SME events leading to the identification of barriers to adoption of ethical toolkits and second, our method for conducting a review and coding of the state-of-the-art toolkits against a range of criteria. We perform an analysis of these toolkits and SME events in Section IV, which leads to a series of recommendations, conclusions, and the wider implications of findings in Section V.

A. Risk Factors in AI
When conceptualizing, creating, and implementing an AI system, it is important to consider the risk factors associated with the data used, the model(s) built, and the life span of the model [18], [19]. Furthermore, the societal outcomes and impacts (negative or positive; helpful or harmful) arising during the life span of application should also be considered. From a business perspective, there is a clear relationship between perceived risk in an AI system in a given context and how much trust users have in the decisions it makes [24], [25]. The majority of risk factors are well documented. Bias is one of the most complex factors as consideration must be given to bias that is embedded into organizational or industrial cultures, personal, unconscious, and human bias and data representation bias [26], [27]. For example, data that have been labeled by humans for training a model may be subjective, even among experts. Different models may need to be developed for different genders, cultures, etc., as it is rarely possible to generalize models to an entire human population based on limited training data. Fairness is about treating people equally through developing models that encapsulate moral standards in the decision-making process. Explainability is required, so all stakeholders, including people impacted by the decisions of automated systems, can understand how a decision is made and the user knows why a system has made a decision [28], [29]. Societal impacts (potential benefits and harms) must be considered by a business, not only just to mitigate reputational damage in case of legal complaints but also to meet or exceed minimum standards of business ethics. Businesses must question where responsibility (tasks and obligations) lies within their AI governance framework and define accountability (oversight and liability) to roles across the design/development/ deployment life cycle. With AI legislation changes on the horizon, deep thinking and consensus surrounding these risk factors is required by both academics and industry regardless of size to assess the risk of an AI solution to both individuals and society. The problem is now bridging the gap between principles and practice, so there is some assurance that AI systems comply with the agreed principles.

B. Principles and Guidelines
Over the past five years, governments, corporations, and international bodies have produced a significant amount of guidance on the ethical dimensions of AI and data driven technologies. To understand how crowded this space is and the difficultly of choice for SMEs with regard to which guidelines to follow, this section provides a brief overview. In 2019, Jobin et al. [4] conducted a survey of global ethical guidelines comprised of 84 documents and analyzed their thematic coverage over 11 ethical principles identified by keywords. This work provides a good understanding of the coverage of ethical AI principles and guidelines between 2011 and April 2019. However, the landscape is very dynamic. In 2019, the Beijing Academy of Artificial Intelligence published the Beijing AI Principles advocating ethical AI [5], OECD proposed five value-based principles for the responsible stewardship of trustworthy AI [7], and the European Commission issued ethical guidelines for Trustworthy AI [2]. In 2020, the U.S. Office of Management and Budget issued Guidance for Regulation of Artificial Intelligence Applications [11]. In June 2021, The General Conference of the United Nations Educational, Scientific and Cultural Organization (UNESCO) presented the Draft Text of the Recommendation on the Ethics of Artificial Intelligence, which focuses on a human-centered approach to AI, recommending that "AI must be for the greater interest of the people, not the other way around" [8]. The U.K. government provided an updated summary of data and AI ethical principles developed by both the public sector and the government in 2020 [9], which included a joint publication on AI procurement guidelines developed with the World Economic Forum [30], and specific guidelines and a checklist for using AI in health care [31]. In 2021, the U.K. AI Council published an AI road map [32], further "guidance" on procurement [33] and its national data strategy [34]. A brief analysis of the commonality of ethical principles can be found as shown by Crockett [35], from which a subset of our toolkit evaluation criteria is derived.

C. Legal Frameworks
Legal frameworks in the space of AI and data driven technologies are relatively new and rapidly emerging. The GDPR 2018 [1] first introduced Article 22, a series of safeguards and information obligations in relation to automated decision-making. These included empowering the data subject as stated in Recital 71 "not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her" [1], the right to ask for human intervention, explanation of how the automated decision was made "the logic involved." Recital 71 states that the data controller should use appropriate mathematical and statistical procedures for profiling and that data should be accurate in order to minimize the risk of errors [1]. In 2018, the EU also published its AI strategy which promoted a human-centric approach, which focused on respecting European values and human rights. Recently, the EU has published the proposed Regulatory Framework on AI [36], which contains a framework to assess the risk of any AI product, service, or system. Four risk levels are defined as follows.
1) Unacceptable risk: AI systems considered a clear threat to the safety, livelihoods, and rights of people will be banned. 2) High risk: AI systems identified as high risk (including law enforcement, credit scoring, and border control management) are subject to a deep risk assessment, mitigation strategy, high quality datasets, traceability, documentation, clear explainability protocols to the user, and a high level of robustness, security, and accuracy. 3) Limited risk: This includes chatbots where humanmachine transparency is a requirement. 4) Minimal risk: This includes applications such as AIenabled video games or spam filters [36]. An excellent primer on the principles and priorities required for a legal framework can be found in [37], produced by the Council of Europe's Ad Hoc Committee on Artificial Intelligence. Leslie et al. [37] also provide suggestions on options for a legal framework and a mapping between substantive human and legal rights and key obligations of AI developers when building AI systems and services.

III. METHODOLOGY
This article comprises a two-part methodology. The first part is an analysis of a series of practical SME engagement events. These events took place between July 2020 and June 2021 and were designed to capture the "SME voice" on their understanding of ethical AI, its practical implementation, awareness of ethical toolkits, and the barriers to adopting good ethical practices. The aim of the analysis was to establish which themes associated with ethical AI that SMEs are most aware of, and the perceived barriers to ethical AI adoption. Part two is a review of a range of practical toolkits designed to support the implementing into practice of ethical AI principles. These toolkits were evaluated and coded against the common themes and barriers from the SME events and against a range of criteria relating to coverage of the AI life cycle, and general ethics themes.

A. Part 1: SME Engagement and Consultation Study
This section outlines the methodologies for two distinct SME engagement events which explored the need for and barriers to ethical AI.

1) Event 1: Our Place Our Data:
To understand the landscape for local businesses and local authorities in ethical AI understanding and practice, a qualitative research study took place in June and July 2020, comprising two roundtables and follow-up interviews. The study was initiated by Manchester Metropolitan University (MMU), designed in collaboration with an independent think tank and with the support of the U.K.'s All-Party Parliamentary Group on Data Analytics (APPGDA). During the roundtables, participants were provided with an overview of a proposed model for place-based support for ethical AI to build a local ecosystem in which ethical and responsible AI development could be nurtured and thrive. The theme for the first roundtable (n = 20) was "Data and Public: Creating a data-driven future for Greater Manchester" and sought to capture responses to a series of key questions, which included the following.
1) How can the public be better engaged with policies around ethical data use? 2) What are the current challenges and shortcomings associated with ethical guidelines and principles for the use of data by public and private-sector bodies? 3) What does an effective local data ecosystem looks like? The second roundtable was at U.K. national level, featuring not only local SMEs and Policy Makers but also Members of Parliament and the House of Lords, and key national stakeholders such as the Centre for Data Ethics and Innovation (CDEI), Visa, British Standards Institute, and the Greater Manchester Combined Authority (GMCA). The second roundtable (n = 18) focused on how parliament and government could work to develop local data strategies as part of a wider effort to make the U.K. a world leader in ethical, data-driven technologies. It also analyzed current links between central government, regulators, local and combined authorities, and industry, and considered how those links could be developed over the coming years. The discussion focused on how to develop place-based approaches to data ethics; the role for regulators and government bodies; the feasibility of an "Ethical AI kitemark," which organizations should lead on ethical AI policies at the national and regional level; and what challenges exist with regard to bringing these bodies together.
Following the roundtables (between August 2020 and March 2021), a series of supplementary follow-up interviews were conducted by Policy Connect with selected participants to explore some of the emergent themes in greater depth. Summary reports from both roundtable events and the interviews were produced by Policy Connect and cross-checked by this study's authors (Crockett and Colyer) for accuracy, identified emergent themes, and indicators of agreement, disagreement, and consensus among participants.
2) Event 2. Greater Manchester AI Foundry: The Greater Manchester AI Foundry [41], with £3 million ERDF funding, is a three-year research and innovation project which commenced in July 2020. The aim of the Foundry is to increase SME performance by placing AI research and innovation at the center of business growth through practical knowledge transfer from AI academic research into industry. SMEs go through two phases: 1) Phase 1 is a series of workshops on AI development from a business perspective and 2) Phase 2 is a technical assist to develop a prototype AI solution. The objective is that research acts as a technology accelerator for new products and services based on AI. Given the importance of the development of ethical technology, a pilot workshop was given in early 2021 to the first cohort of SME participants (n = 20) to enable SMEs to gain an understanding of ethical, social, and legal perspectives of AI and data privacy, and also to facilitate practical ethics into the technical assists. The workshop was not intended to provide any legal advice, rather it was designed to showcase best practice in ethics and regulatory compliance. The first workshop was positively received and a full workshop was developed and embedded with a second cohort in June 2021. In the full workshop, SMEs were actively encouraged to look at the impact and assess the risks of their AI product or service in light of the newly proposed EU regulation [36]. The workshops introduced a variety of ethical toolkits and activities with SMEs including datasheets for datasets [42], consequence scanning [43], conducting a data privacy impact assessment [44], and examining the risk to stakeholders of an AI recruitment tool using padlet [45]. Feedback on adoption of potential tools and barriers to use was obtained through Q and A and discussion during and after the workshop. Workshop members were also asked to complete a longitudinal ethical AI practice survey [46]. Feedback was anonymized and collated and thematic coding was undertaken to identify ethical concerns and barriers.

B. Part 2: Review of Practical "Ethical" Toolkits
Our review of toolkits covers academic, organizational, government, and gray literature sources. The search strategy employed the following primary keywords: (toolkit, resource, guidelines, guidance, checklist, methodology, method, activity, process, framework, workflow, approach); (ethical, responsible, trustworthy, trusted, data, data ethics, tech ethics); and [artificial intelligence (AI), machine learning (ML)]. Our toolkit dataset was created by using the primary keywords to perform searches on Google Scholar and Scopus and gray online literature on Google from 2017 to July 5th, 2021. Our toolkit dataset was also cross-checked with work published by Morley et al. [23] and Moltzau [38], who produced a full typology of identified methods and tools (up to mid-July 2019) which were limited to helping developers, engineers, and designers of ML apply ethics within their roles. In comparison, our review takes on a more holistic view in analyzing toolkits that are also used to initiate engagement with wider public stakeholders to explain decisions and build trust. Inclusion criteria were documents (checklists, guidelines, activities) including those published by public and private sectors, governments, and international bodies and the toolkit language was English. Exclusion criteria were legal frameworks, opinion articles and speeches. Once a list of toolkits that met the inclusion criteria was obtained (referred to as the EAI toolkit dataset), each toolkit was evaluated and coded independently by expert researchers in the field of AI and ethics  Tables I-IV. For each toolkit, its source (academic, organizational, business, and gray) was recorded, along with publication year, whether it was open source, and the country of origin.
Criteria in Group E were determined on the basis of the findings of the two SME engagement events reported in Section IV -analysis of SME engagement events.
A modified nominal group approach to coding was adopted [39], [40]. The first round of coding involved three experts in the fields of AI, ethics, and business engagement, independently evaluating two-thirds of the EAI toolkit dataset with each toolkit being evaluated by two experts initially. A structured spreadsheet containing links to the toolkits and the 33 criteria for coding was given to each expert to evaluate and code independently. Each criterion was coded according to a three-point Likert scale with values in (01, 2) indicating, respectively, weak, moderate, and strong levels of support by a toolkit for a given criterion. For example, if a toolkit strongly addressed B 10 -AI systems should be sustainable and work to benefit humans, the society, and the environment -then it was scored as 2; if it moderately or partially addressed that criterion, it was scored 1; and if support for the criterion was largely or completely absent, then it was scored as 0.
The first round of independent coding revealed a 72% agreement across 33 criteria; 18% of criteria indicated that there was a disagreement with one expert coding 0 and another scoring 1 or 2; in 10% of cases, both experts agreed that the toolkit contained at least some evidence of the criteria, but the experts disagreed on how much (scoring 1 or 2). When adopting a percentage agreement approach [39] there is no agreed threshold for consensus, and it is up to the researchers to judge what represents acceptable agreement for a particular study. A second round of independent expert coding was then instigated for all toolkits where there was significant disagreement for any criteria, defined as when one expert scored 0 and the other expert either 1 or 2; these toolkits were fully coded by a third expert in an attempt to establish majority agreement. The level of agreement between the three experts was then recorded in a structured spreadsheet for 77 toolkits. There was a good majority agreement between the two experts for 89% of the 33 criteria scored across the 77 toolkits. Experts were unable to reach a majority agreement on all criteria across all toolkits in only 1% of cases. The most common disagreement between the coders was on the interpretation of B 10 -AI systems should be sustainable and work to benefit humans, the society, and the environment (6 out of 77 toolkits) and on the toolkit coverage of C 4 -deployment and monitoring (6 out of 77 toolkits).

IV. ANALYSIS AND DISCUSSION
A. Analysis of SME Engagement Events Event 1: For event 1, analysis of the first roundtable revealed that ethical and legal issues surrounding "data" and not "AI" needed to be resolved first before the wider ethical aspects of AI could be addressed. This was true for both public and private sector organizations. The key themes emerging from the roundtables were as follows: 1) ethical guidelines and principles should be simple and flexible and should be much more than a checklist; 2) practical guidance on how to apply data and ethical AI principles should be usable; 3) mechanisms were needed to support practical guidance (training, resource support) in partnership with local authorities; 4) data-driven technology strategies should be developed in partnership with all stakeholders; 5) SMEs should have access to "resource knowledge sharing" to make effective and ethical use of AI and ML. The main output of the Event 1 study was a report Our Place, Our Data: Involving Local People in Data and AI-Based Recovery [47], which made five recommendations to the U.K. government, including that local authorities should work in partnership with businesses (including SMEs) and academic institutions to develop data-driven technology strategies to develop innovative AI services and products which have citizen engagement at the heart of the creation process.
Event 2: The analysis of Event 2 was based on Q and A during the two cohort sessions and follow-ups in 1:1 virtual meetings. SMEs referred to the following Information Commissioner's Office (ICO) guidance: What are the accountability and governance implications of AI? [48], guidance on AI and data protection [44], data protection impact assessments [44], what do we need to do to ensure lawfulness, fairness, and transparency in AI systems? [45], and how do we ensure individual rights in our AI systems? [49]. They noted these documents as long and complicated, and provided no practical advice or methods on how to apply them. The key message was that toolkits/guidance needed to be simpler. One SME data scientist stated that they "did not know some of this existed" emphasizing the general lack of awareness. SMEs thought that training or free consultancy was required to help them understand and apply legal guidance in relation to AI and data. Three SMEs also thought that in general, ICO guidance was "subject to interpretation." Positive feedback was received about the use of consequence scanning [43] as a useful way to think about harms and risks of a product at conceptualization, but in general SMEs said whether they would be used in practice was based on whether they had available resource. They had no strong opinion about the benefits of involving the public, for example, as a stakeholder in an activity such as consequence scanning. Despite growing consensus on the benefits of public involvement to build trust in AI tech [50], [51], SMEs indicated that they were not sure how to involve the public and that the real benefits of consulting with the public was not clear. Two SMEs suggested that successful case studies would benefit them. The SMEs thought that the toolkits presented were useful, but they needed time to learn how to use them -not only just one-off training but also how to practically apply them in their own business.
Summary: From these two events, the barriers to SMEs adopting toolkits were identified as follows.
1) Availability of resources to SMEs (people and time), current skills, and training requirements. 2) Skepticism about the benefits of public stakeholder involvement in the design of new products and services. 3) Lack of understanding around governance of responsibility and accountability regarding AI development and implementation outcomes. 4) The lack of audit and compliance and legal frameworks. 5) Need for practical training and upskilling regarding ethics, data and legal frameworks, and managing liabilities. 6) Challenges associated with communication with usersdifferent language for different stakeholders. 7) Serious implications for a business in terms of liability.
What are the consequences of noncompliance?

B. Toolkit Analysis
Following the methodology described in Section III, a total of 77 toolkits were identified which met the inclusion criteria. 30 of these toolkits were from 2021, while the earliest was from 2017. A total of 51% of toolkits were from the US, 23% were from the U.K. and there was representation from South America, China, Denmark, Saudi Arabia, Germany, and Ireland, in addition to three toolkits which were classed as global. The process for analyzing toolkits can be defined as follows.
1) All toolkits were scored using the groups of criteria B to E (see Tables I to IV) according to a three-point Likert scale with values in (0, 1, 2) indicating, respectively, weak, moderate, and strong level of support by a toolkit for a given criterion. As explained in Section III, these are the combined scores from the interannotator coding and agreement process. 2) For the analysis of the criteria, we derived an n by m matrix R (see supplementary material), where n is the number of toolkits (n = 77) and m is the number of criteria considered (m = 33). 3) Each cell in R contains one of (0, 1, 2, D), with D standing for a disagreement among coders. 4) From R, we derive a mean score for a toolkit (i.e., a row) or a criterion (i.e., a column) by taking the mean of its empirical probability distribution (epdf) (excluding disagreements). More specifically, let X be either a row or a column in M, which is assumed to be a discrete random variable. Then, epdf(X) = (p 0 , p 1 , p 2 ), where pi is the probability of the score i in (0, 1, 2). Table V located in the appendix, displays the statistical summary of scores across the 77 toolkits, ranked on the basis of their coverage of criteria groups C, D, and E, where p 0 , p 1 , and p 2 are the values of the epdf, shown as percentages, of the Likert scores on the criteria, and m is the number of criteria assessed. Group B is not included in Table V as it considerably overlaps with responsible AI themes in Group D. We opted for the latter, given that it provides a more fine-grained analysis of tool coverage. For example, B 2 -AI must always be fair, unbiased, and transparent in the decision-making process -is covered by D 2 -fairness (including bias) and D 3 (transparency).
The top-ranking toolkit was Microsoft's Responsible Innovation: A Best Practices Toolkit [111]. While this toolkit was targeted at developers, it had a strong focus on identifying potential negative consequences of technology on humans. The toolkit features three elements. The first, judgment call -a game and team-based activity that explores all of Microsoft's AI principles [128] through scenario imagining where the aim is for participants to write product reviews for different stakeholders accessing the impact and harms. Harms modeling -a framework for product teams based on the four pillars of responsible innovation ("injuries, denial of consequential services, infringement on human rights, and erosion of democratic and societal structures" [111]) -is designed for teams to look at real world impacts of technology. Finally, community jury, defined as an adaptation of the citizen jury [111] brings together the product team and user stakeholders to discuss various product artifacts, deliberate and cocreate new technologies over a 2-3-h session. This toolkit had moderate to strong coverage across all criteria B, C, and D. However, it did not contain any exemplars E 1 , and had no training guides E 7 , which is a key requirement for SMEs. That said, its uniqueness is its ability to engage the public, seek consensus, and opinion, and it is forward-thinking in terms of providing practical guidance that is applicable to a wide range of businesses/organizations. Ranked second was the U.K. government's Data Ethics Framework Guidance, published in 2020, which focuses on responsible and ethical use of data in the public sector [114]. While the emphasis is on the public sector, the guidance is targeted at all stakeholders who use or interact with data, including policy makers and data scientists. Similar to [111], the emphasis is on defining and understanding the public benefit of any "data project" including human rights, understanding potential consequences, compliance with law and diversity in the development team. The toolkit provides a set of questions which are scored on a Likert scale based on clarity and understanding with respect to a specific project. The framework also covers algorithms and outputs in relation to AI and is applicable to all stages of the AI life cycle. This toolkit also did not provide any examples of practical application E 1 and is less inclusive in its approach by not involving wider publics as stakeholders E 3. The toolkit did not offer any specific training E 8 .
Table V also highlights the lowest ranking toolkits [70], [97], and [125], none of which provided strong evidence of coverage across any of the criteria. For example, Covington is a global law firm, based in USA. Its toolkit [125] claims to provide practical guidance for "the evolving regulatory landscape" with an emphasis on USA, U.K., and EU. The guidance is in the form of overviews, summaries of news articles, and a white paper with links to recent AI legislation articles and to the ICO/Alan Turing Explaining AI Decisions' toolkit [83]. On the basis of our findings across the two SME engagement events, SMEs requested more training in order to understand the implications of legal frameworks and this toolkit would be difficult for them to practically apply as it is more a means of monitoring evolving regulation and legislation. Fig. 1 shows the distribution of mean scores by groups of criteria. For example, one can see that criteria E (the practical application aspects for SMEs) has the lowest median and overall coverage by the toolkits (each, represented as a data point). Each plot represents one toolkit. This confirms the largely consensus view arising from our two events that in spite of the existence of toolkits to support responsible and ethical AI, most still lack adequate instructions and training to facilitate adoption. Many require significant time and specialist skills for implementation due to their length Analysis has shown that no single toolkit covers all criteria, as indicated in Table V (p 0 > 0 in all columns). Consequently, each set of criteria will now be analyzed independently to assess criterion coverage and highlight those toolkits with the highest ranked coverage. This will help SMEs to select toolkits that best align with their business culture and values, and the stage they are at in developing their own ethical policies and procedures.
1) Common Ethical Principles (Group B): Fig. 2 shows the toolkit coverage of the ethical principles B 1 , …, B 11 . Clearly, B 2 -AI must always be fair, unbiased, and transparent in the decision-making process receives the highest coverage across all toolkits. This is closely followed B 3 -AI systems should always  operate within the law and have human accountability and B 4 -data governance and data privacy should be incorporated into the AI life cycle. These findings align with predominant global ethical principles [4]. Of least coverage was B 5 , humans should always know when they have interactions with an AI system, which is only highlighted by toolkits [74], [116], [118], [120], and [126] and B 8 -a human should always be in the loop for automated decision-making, covered by [101], [112], and [126]. Toolkit [126] (ranked 33 overall) stands out in this group. Titled "Application Guide for the Ethical Assessment of AI for Actors within the Entrepreneurial Ecosystem," the toolkit is an open source guide published by the Inter-America Development Bank in May 2021. Its interdisciplinary approach to ethical self-assessment covers all stages on the AI life cycle, governance, and security with a focus on human involvement in AI systems. The guide has a three-stage assessment to determine the level of human involvement based on the impact that the system has on a human's life. The toolkit helps organizations define associated key performance indicators, risk mitigation, and even develop emergency responses following analysis of all conceivable scenarios.
2) Stages of AI Product Life Cycle (Group C): Fig. 3 shows the toolkit coverage for the four stages of the AI life cycle: 1) conceptualization C 1 ; 2) data preparation C 2 ; 3) exploration, model building, and evaluation C 3 ; and 4) deployment and monitoring C 4 . Analysis showed that toolkits were less likely to cover the audit and compliance stage of the life cycle, compared to the other stages, presumably because few regulatory frameworks or standards are yet approved. For example, to date, out of the IEEE P7000 standards in development, only the IEEE 7010-2020 -IEEE Recommended Practice for Assessing the Impact of Autonomous and Intelligent Systems on Human Well-Being [14] is available on subscription only. Only toolkits [55], [56], [65], [70], [83], [85], [95], [101], [104], and [107] covered the whole life cycle, but to varying degrees. Toolkits [56] and [107] ranked, respectively, third and fifth overall against all criteria (see Table II). Agile ethics for AI (HAI) [56] is a Trello board which contains a series of boards covering scope, data audit, training, analysis, feedback, calibrate (optimal AI for increased uptake), augmentation (e.g., upskilling and training), and "people and the environment" which addresses accountability in AI deployment. Each board contains a series of "TO DOs" with specific resources, all available as open source. The World Economic Forum's AI Procurement in a Box: Workbook [107] is a lengthy tool kit (54 pages) that features a series of questions and risk matrices and mapping tools covering the full AI life cycle. It is intended for businesses seeking to procure AI solutions. It also features a user manual with a strong emphasis on how to define the public benefit of AI while assessing risks in the early stages of conceptualization. The toolkit provides guidance on how to address both the technical and ethical limitations of data, clearly addressing the impact of bias.
3) Responsible AI Themes (Group D): Fig. 4. shows the toolkit coverage for the responsible AI themes: Robustness D 1 , fairness D 2 , transparency D 3 , accountability D 4 , explainability D 5 , privacy D 6 , safety D 7 , impact D 8 , inclusivity of the toolkit (in general) D 9 , and inclusivity w.r.t. general public inclusion as a stakeholder D 10 . Examination of Group D criteria allows for more fine-grained analysis than within the more general ethical principles (see Fig. 2) and we expected to see the similarity with ethical principle B 2 and fairness D 2 with regard to coverage. Ninety-five percent of all toolkits moderately or strongly addressed the issue of fairness, with 88% also addressing the impact of AI technology on society D 8 . Accountability D 4 , both in terms of the processes of developing responsible technology and the decision outcome, quality of the data and the model produced, also had moderate to strong coverage in 89% of toolkits. More than half (53%) of the toolkits failed to include the public voice, in any codesign or coproduction process to seek their opinions (D 10 ) and only 62% of toolkits were moderately inclusive to the requirements and needs of a wide range of stakeholders (i.e., data scientists, software developers, managers, CEOs) (D 9 ). The Action-Oriented AI Policy Toolkit for Technology Audits by Community Advocates and Activists [122], Agile Ethics for AI (HAI) [56], the JUST AI reflection prototype [82], Microsoft's -Responsible Innovation: A Best Practices Toolkit [111], U.K. governments, Data Ethics Framework Guidance [114], and the Royal Society -Democratizing decisions about technology toolkit [120] were the only toolkits to have strong coverage of public inclusivity embedded within the toolkit objectives. As reported in Ouchchy et al. [129], public opinion is critical in the acceptance and adoption of new technology. Other work [130] has recommended that businesses including ethical value statements on trusted webpages; the inclusion of both ethicists and the public in new technology discussions could avert negative media responses and reputational damage to businesses. The importance of the role of the public stakeholder is also highlighted in policy road maps [32] and proposed regulation [36]. Fig. 5 displays the ranked criteria in relation to different aspects regarding the practical application of the toolkits. Only 27% of the toolkits were coded as being equivalent to "quick start" guidance E 2 . Sixty-nine percent of toolkits and their associated websites provided no exemplars or case studies of how to practically apply the toolkit; only 6% provided at least one example of adoption E 1 . Coverage of stakeholders' inclusivity E 4 within the toolkit was scored as weak (27%), moderate (56%), and strong (17%). Analysis showed that toolkits were designed with specific audiences in mind, for example, the technical community (data scientists, programmers, and data analysts) where the focus was on criteria such as bias and fairness in both data quality and model generation. There were few toolkits that had end users and public inclusivity in mind, suggesting that the trajectory of practical application of toolkits is behind emerging legislation and wider discourse around building trust through public involvement [120]. The feasibility of practical application of toolkits w.r.t. to SME resources (workload, personnel, and budgets) E 5 was ranked similar to E 4 . This indicated that SMEs would have to make a moderate to high investment to apply toolkits and embed ethical values and processes into business operations. Eighty-three percent of toolkits provided no training opportunities such as step-by-step instructions, user guides or checklist on how to practically use the toolkit. A strong emphasis on training E 7 could only be found in IEEE Ethical Aligned Design [65] and The Royal Society -Democratizing decisions about technology toolkit [120]. The following toolkits covered some aspects of training: [56], [60], [70], [88], [99], [102], [104], [107], [108], [114], and [120]. An observation was that toolkits that were focused on the conceptualization stage of the AI life cycle and/or had more stakeholder inclusivity included some form of training.

4) Practical Application Aspects (Group E):
Finally, evidence of adoption of a specific toolkit by SMEs' E 8 was barely evident to nonexistent in 90% of toolkits. This suggests that either toolkits have not been designed with SMEs in mind, the barriers to practical application are too high, or toolkits are simply not being evaluated and publicized through practical use cases. Digital Catapult's Machine Intelligence for Business [88] (ranked 24th in Table II) has published a short case study on Loomi -an AI assistant which builds trust through ethical transparent design [129]. Loomi, also the name of the SME featured in the case study, utilized Digital Catapult's ethics framework to reposition "the product using ethics as a key differentiator." IDEO's toolkit (ranked 16th in Table II) highlights the benefits of human-centered design using its Design Kit [64] in a series of humanitarian case studies.
Across the criteria in this category E 1 ,..., E 8 , DotEveryone's Consequence Scanning toolkit [43], ranked 21st (Table II), exhibited moderate to strong coverage of all criteria. This open-source toolkit, developed in U.K., allows businesses and organizations (regardless of size) to examine, debate, risk assess, and mitigate the potential consequences of their product/service on society, communities, and the environment. A manual is provided (27 pages), with minimal resources required. The tool is employed at the conceptualization stage, with all stakeholders taking part, although public stakeholders are not specifically mentioned (D 10 ). A strong facilitator is needed which may be a barrier for SMEs, but a session can last as little as 90 min. The tool has been reportedly adopted by SalesforceUX [130] as a way to bring design risks out into the open.

C. Discussion
This article has evaluated and analyzed 77 toolkits that cover different aspects of the ML/AL life cycle and common ethical principles, responsible AI themes, such as bias and fairness, and degrees of practical application. Consequently, every organization should be able to find one or more toolkits that fit with their working practices, culture, and to complement their organizational values. Although Table II [111] as the number one toolkit with regard to our criteria (C, D, and E), it still has limitations in its practical application by SMEs. Therefore, this research concludes that there is not a toolkit currently in existence that overcomes all the barriers and fully meets all the needs of SMEs identified in the analysis of the two SME engagement events. SMEs struggle with long, wordy, and technical documents. They require case studies, clear compelling stories of benefits, and step-by-step instruction manuals on how to use and embed toolkits into operations (and how much time/cash it will cost).
There was a good distribution across the toolkits of all the ethical principles (criteria B). Greatest coverage (mean of 1.64) was the Data Ethics Impact Assessment (ranked 17th in Table II) [91] which comprised a 16-page questionnaire designed for organizations to integrate the assessment of data ethics and the impacts of their AI on humans and society within their development and operational processes. The 56 questions cover aspects of transparency, equality, data governance, sustainability, accountability, and human-centered design and centered, drawing on DataEthics.eu's principles of data ethics. In contrast, Nesta's Civic Al Toolkit [121], which focused on using AI and data to address climate crisis and the Online Ethics Canvas [127], had little to no coverage. Results concluded that few toolkits addressed all 11 principles, and none were considered to fully address all 11 by any expert coder. Therefore, organizations will probably need to use more than one toolkit to get comprehensive coverage.
Detailed analysis in Section IV revealed that toolkits [55], [56], [65], [70], [83], [85], [95], [101], [104], and [107] covered the whole AI life cycle, but to varying degrees. Experts agreed that 24% of toolkits did not cover audit and compliance and this may be due to the current lack of AI legislation, regulation, and ethics standards. However, the proposed EU Regulation on AI [132] is likely to have a significant impact on future toolkit development, as it is being described by the Global Centre for Data Innovation as the "most restrictive regulation of AI" in the world. The expert coders agreed that 80% of toolkits analyzed in this study placed emphasis on getting things right the first time, i.e., at the point of AI product or service conceptualization, and can be seen as proactive in determining the consequences and harms a potential product could have on humans and society.
Analysis across the responsible AI themes (criteria D) indicates that the vast majority of toolkits covered aspects of fairness and the impact of AI. While these are core values in developing ethical and responsible AI, SMEs do need to ensure that they address all themes across the AI life cycle through culture change, rather than becoming fixated on bias and fairness to the detriment of other themes. It is unsurprising that so few toolkits strongly emphasize the importance of citizen representation in their toolkit application. Only 8% of all toolkits strongly advocated the participation of citizens, with 53% relying only on internal stakeholders to take part. An absence of public involvement, especially in the new AI product/service conceptualization phase, leads to flaws in design thinking due to a lack of diversity and inclusivity, which leads to narrower perspectives. Consequently, a great business idea, with no public license to operate, can ultimately lead to reputational damage and loss of revenue. For example, Deloitte reported that a lack of inclusivity in the conceptualization stage of a smart city design resulted in a negative impact as people in wheelchairs were unable to access eye-level retina scanners that require the person to be standing [133]. Section IV highlighted only six toolkits featuring citizen inclusivity. SMEs urgently need to find ways to engage and involve more diverse teams including people outside of their organizations, such as the general public. Our SME engagement events found that this activity is typically beyond their resources and skillset; they also raised concerns about intellectual property rights and trade secrets being disclosed. Put simply, SMEs need support and advice on how to engage effectively. The Community Jury proposed within Microsoft's Responsible innovation: A best practices toolkit [111] is a good example of citizen engagement in the AI life cycle. The caveat is that it was designed by and for a large corporate and not an SME. Setting up such a jury may be daunting and resource intensive for an SME; we propose setting up city or regional juries, focused on ethical AI tech, as part of collective approach, where SMEs could present novel ideas and seek public opinion on design solutions. Ultimately, SMEs should seek to cocreate and codesign with citizens to build trust and obtain the public license to operate, but this is a significant step change to current operations.
Our analysis also highlighted the lack of exemplars or case studies by those organizations who have developed the toolkits. There was little evidence of adoption and virtually none involving SMEs. This is not to say they haven't been involved, but stories, outcomes, analyses, benefits, and outcomes are not in the public domain. This is a key knowledge gap that should be addressed to close the gap between ethical principles and practice. Toolkit developers could produce publicly accessible case studies to thoroughly document the journey and the impacts of adopting ethical practices. This is crucial to lower resistance, leverage investment, and gain the trust and attention of SMEs to invest their limited resources in upskilling and training their employees on AI ethics.
Guidance on how to train people to use the toolkits is another significant challenge. Our analysis indicated that 83% of toolkits did not provide any training material on how to practically implement the tool within the organization. While the overall majority of toolkits are open source and in the public domain, some organizations did offer consultation opportunities for a fee [113], [124], [125]. However, this is not enough, particularly for SMEs, if they do not come with comprehensive training and support materials.
It is important to note that many of these toolkits have been designed for specific and narrow purposes, with no intention to support all possible dimensions of ethical AI, not least because many were produced while ethical frameworks were still under development. For example, IBM's 360 Fairness tool [78] was conceived to focus on evaluating bias and the fairness of algorithms, with no explicit regard for any assessment of eventual outcomes from decisions supported by said algorithms. At the other end of the spectrum, AINow's Algorithmic Impact Assessment toolkit [53] is "designed to support affected communities and stakeholders as they seek to assess the claims made about these systems, and to determine where -or if -their use is acceptable." It is therefore good to bear in mind that SMEs may need to deploy two or more toolkits to fully capture all dimensions of ethical operations.

V. CONCLUSION
This research aimed to address two research questions as follows: 1) first, to understand the AI ethics landscape from the SME perspective (and uncover any existing barriers to adoption); 2) second, to evaluate and identify existing toolkits that are suitable for practical application by SMEs. Two SME engagement events were conducted that identified a number of common barriers to ethical AI adoption by SMEs on the themes of: 1) resources (people and time); 2) practical business-focused training and upskilling on ethical and responsible AI; 3) data and AI governance infrastructures; 4) citizen engagement; 5) applicability of legal frameworks (data and AI) and how to apply them; and 6) audit, compliance, and liability. Next, a comprehensive review provided a picture of the current state of the market in availability of toolkits for embedding AI ethical frameworks and governance into an SME culture. Our key findings are summarized as recommendations to both the SME and academic communities.
There is no one-size-fits-all toolkit that provides guidance sufficient to cover all ethical principles and themes around responsible and ethical AI. Toolkits vary in their feasibility to implement. It is recommended that SMEs select toolkits based on their current capacity, resources, and ethical awareness levels -focusing initially at the conceptualization stage of the AI life cycle and then extending throughout.
Academics engaged in knowledge transfer projects with businesses should also share good ethical practices, policies, procedures and approval templates from their universities. While established processes governing research ethics are different, for example, in terms of the data processed and controlled, and differences in legal basis according to GDPR, they can help inform the private sector and provide cross pollination of good ethical practices. In this article, ethical AI toolkits have been analyzed from an SME perceptive; however, evaluation of criteria B, C, and D provides a useful reference to the academic community, who may wish to embed the use of toolkits into their ethics approvals and evaluations of research projects. Finally, this analysis contributes a useful teaching resource for courses that include AI ethics and/or data and AI governance, to enable future data scientists and analysts to operationalize practical data and AI ethics within their future employment settings. Our next step is to produce an easy online tool to help SMEs select the best toolkits to implement/inform practice based on coverage, ease of implementation, and stage in their ethical AI evolution as a company. Our proposed online selection tool will be a curated database that will allow SMEs to provide their own rating across different categories following a similar methodology to ours in this article. They will also be able to propose and categorize new toolkits to add to the database as and when they become available, given the high level of activity in this domain. The tool will be cocreated with SMEs and citizen stakeholders and be flexible to incorporate legislation changes and provide a go-to resource kit.

ACKNOWLEDGMENT
The authors would like to thank Policy Connect and the APPGDA for their work in the inquiry that led to the Our Place Our Data Report [47] and the open source communities that we used to conduct the data processing, analysis, and visualization such as Seaborn [133], Matplotlib [134], Pandas [135], and Jupyter Lab. of