Developing Responsible Chatbots for Financial Services: A Pattern-Oriented Responsible Artificial Intelligence Engineering Approach

The recent release of ChatGPT has gained huge attention and discussion worldwide, with responsible artificial intelligence (RAI) being a crucial topic of discussion. One key question is, “How can we ensure that AI systems, like ChatGPT, are developed and adopted in a responsible way?” To tackle RAI challenges, various ethical principles have been released by governments, organizations, and companies. However, those principles are very abstract and not practical enough. Further, significant efforts have been put on algorithm-level solutions that only address a narrow set of principles, such as fairness and privacy. To fill the gap, we adopt a pattern-oriented RAI engineering approach and build an RAI pattern catalog to operationalize RAI from a system perspective. In this article, we first summarize the major challenges in operationalizing RAI at scale and introduce how we use the RAI pattern catalog to address those challenges. We then examine the risks at each stage of the chatbot development process and recommend pattern-driven mitigations to evaluate the usefulness of the RAI pattern catalog in a real-world setting.

The recent release of ChatGPT has gained huge attention and discussion worldwide, with responsible artificial intelligence (RAI) being a crucial topic of discussion.One key question is, "How can we ensure that AI systems, like ChatGPT, are developed and adopted in a responsible way?"To tackle RAI challenges, various ethical principles have been released by governments, organizations, and companies.However, those principles are very abstract and not practical enough.Further, significant efforts have been put on algorithm-level solutions that only address a narrow set of principles, such as fairness and privacy.To fill the gap, we adopt a pattern-oriented RAI engineering approach and build an RAI pattern catalog to operationalize RAI from a system perspective.In this article, we first summarize the major challenges in operationalizing RAI at scale and introduce how we use the RAI pattern catalog to address those challenges.We then examine the risks at each stage of the chatbot development process and recommend pattern-driven mitigations to evaluate the usefulness of the RAI pattern catalog in a real-world setting.

C
hatGPT has gained huge attention and discussion worldwide, with responsible artificial intelligence (RAI) being a crucial topic of discussion.One key question is, "How we can ensure that AI systems, like ChatGPT, are developed and adopted in a responsible way?" RAI is the practice of developing, deploying, and maintaining AI systems in a way that benefits humans, society, and the environment, while minimizing the risk of negative consequences.To solve the challenge of RAI, many AI ethics principles have been released recently by governments, organizations, and enterprises. 1 principle-based approach provides technologyneutral and context-independent guidance while allowing context-specific interpretations for implementing RAI.However, those principles are too abstract and high level for practitioners to use in practice.For example, it is a very challenging and complex task to operationalize the human-centered value principle regarding how it can be designed for, implemented, and monitored throughout the entire lifecycle of AI systems.In addition, the existing work mainly focuses on algorithmlevel solutions for a subset of mathematics-amenable AI ethics principles (such as privacy and fairness).However, RAI issues can happen at any stage of the development lifecycle, crosscutting various AI and non-AI components of systems beyond AI algorithms and models.To try to bridge the principle-algorithmic gap, further guidance, such as guidebooks a,b question banks, 2 checklists, 3 and documentation templates, 4,5 have begun to emerge.Those attempts tend to be ad hoc and lack systematic solutions to cover the entire lifecycle of AI systems, taking into account different levels of stakeholders.
We have adopted a pattern-oriented RAI engineering approach 6 and built an RAI pattern catalog c for different types and levels of stakeholders in the AI industry. 7,8In this article, we first summarize the major challenges in operationalizing RAI at scale and introduce how the RAI pattern catalog addresses those challenges.Then, we examine the risks at each stage of the chatbot development process and recommend pattern-oriented mitigations for evaluating the usefulness of the RAI pattern catalog.

RELATED WORK
The concept of chatbots can be traced back to the 1950s, when computer scientist and inventor Alan Turing proposed the Turing Test, which aimed to determine whether a machine could exhibit intelligent behavior indistinguishable from a human.In 1966, Joseph Weizenbaum created ELIZA, the first known chatbot, which was designed to simulate a psychotherapist by responding to user inputs with preprogrammed responses. 9In the 1980s and 1990s, advances in natural language processing and machine learning led to the development of more advanced chatbots, such as Parry and ALICE.In the early 2000s, the rise of messaging platforms and mobile devices made it easier for businesses to integrate chatbots into their customer service systems.In recent years, advancements in AI, such as deep learning and natural language processing, have made it possible for chatbots (e.g., Open-AI's ChatGPT d and IBM Watson Assistant e ) to handle more complex and natural conversations with users, leading to the widespread use of chatbots in various industries, including finance, health care, and e-commerce.Despite the increasing popularity of chatbots, people have many ethical concerns about chatbots.Some studies on chatbots have begun to emerge, such as human trust and emotion. 10,11,12Significant efforts have been put on algorithm-level solutions, which mainly focus on a subset of ethical principles, 13,14,15,16 such as privacy, fairness, and explainability.However, there is lack of RAI governance and engineering studies to assess and mitigate the AI risks of chatbots against all the AI ethics principles.

MAJOR CHALLENGES IN OPERATIONALIZING RAI AT SCALE
Through our engagement with industry, we have summarized three major challenges in operationalizing RAI at scale: Thus, the organizations heavily rely on the project teams to do self-assessment.The current risk assessment practices include checklists, conversations, and information sheets, rather than formal or technical approaches.Organizations usually treat risk analysis as hazard/threat analysis, omitting system vulnerability, exposure risks, and response/mitigation risk.

PATTERN-ORIENTED RAI ENGINEERING APPROACH: RAI PATTERN CATALOG
We have adopted a pattern-oriented responsible engineering approach and built an RAI pattern catalog to address end-to-end/top-to-bottom RAI challenges. 7,8In software engineering, a pattern is a reusable solution to a recurring problem within a given context during software development.Patterns are documented to capture the knowledge about reusable solutions in an accessible and structured way for stakeholders to learn.A pattern catalog is a collection of patterns that are related to some extensions and can be used together or independently.
Based on the results of a multivocal review, 7 we analyzed successful case studies and generalized best practices into patterns.The current version of pattern catalog has collected 63 patterns, including 24 governance patterns, 17 process patterns, and 22 product patterns.To describe the pattern, we extended the traditional pattern template with additional elements, including summary, pattern type, objective, target users, impacted stakeholders, lifecycle stages' relevant AI ethics principles, context, problem, solution, consequences (i.e., benefits and drawbacks), related patterns, and known uses.Each pattern provides meaningful analyses on consequences, where pointers are added to measurement metrics and methods, residues of risk, and new risks are introduced.Patterns are connected at multilevels, multiangles, and across AI system lifecycles through the related patterns.The defined fields in the pattern template can help efficiently select the patterns for mitigating a certain risk.
As illustrated in Figure 1, the RAI pattern catalog has the following characteristics to help stakeholders better navigate the landscape and achieve RAI systems more successfully: Across multiple angles and connected-governance, process, and product: As listed in Figure 2, the patterns are organized into three interconnected categories for easier adoption for impact.Governance patterns for building multilevel governance for RAI.

AI ETHICS AND TRUST: FROM PRINCIPLES TO PRACTICE
Process patterns for establishing responsible software development processes and AI engineering.Product patterns for building responsible-AI-bydesign into AI systems. 8he stakeholders should use product patterns as product features to enforce RAI principles directly in the product and verify/validate the product.In the meantime, the stakeholders should also use process and governance patterns to complement RAI further.
Across multiple organization levels and connectedindustry/community, organization, and teams: The patterns introduced are at different levels, so stakeholders can situate the practice areas in the bigger picture and see how the patterns can fit in and how different patterns influence and reinforce each other at an industry/community, organization, and team level.Across system lifecycle and connectedrequirements, design, implementation, testing, deployment, and postdeployment monitoring: Across the lifecycle of AI systems, different process patterns can be applied at other times, with the outputs of one pattern becoming the input of another.Across the supply-chain, system, and operation layer and connected: We connect the product patterns through a system reference architecture across the AI supply chain, AI system, and operation/deployment infrastructure layer.Benefiting multiple connected risks: Individual RAI risks should be managed in silos by using risk-specific solutions.The patterns often help multiple risks together to raise the RAI posture of the organization significantly.Acknowledging drawbacks and additional risks introduced: Adopting pattern-oriented risk mitigation may introduce additional risks and costs.We recognize them by incorporating drawbacks in the patterns and connecting with other related ways to further address the challenges.Clear differentiation of trust and trustworthiness: Trustworthiness is the ability of an AI system to meet RAI principles, while trust is stakeholders' subjective estimates of the trustworthiness of the AI system.We recognize that the importance of gaining stakeholder trust goes beyond the objective trustworthiness of AI ETHICS AND TRUST: FROM PRINCIPLES TO PRACTICE the systems. 17Gaining trust is about diverse and inclusive engagement, setting realistic expectations, and communicating trustworthiness evidence in a way that stakeholders can understand and meaningfully critique.We include trust and trustworthiness as pattern objectives.

Development Process of Chatbots
In this section, we introduce the development process of chatbots using IBM Watson Assistant and discuss how patterns can be used to address various RAI risks, as illustrated in Figures 3 and 4. In the RAI pattern catalog, each pattern includes the specification of target users, lifecycle stages, and context, which can be used to locate the relevant patterns.
Chatbot development starts with the planning stage.Once the planning is completed, a conversation is designed and then built using the chosen chatbot platform, such as IBM Watson Assistant.The chatbot is then tested and deployed into target production environments.Performance of the chatbot is continuously measured.The chatbot is improved based the performance assessment and new requirements from business.
The RAI risk committee pattern can be adopted by the organization to continuously assess the RAI risks of AI projects, including chatbot projects through a dedicated RAI risk committee or the existing risk committee.The RAI risk assessment pattern can be used by the risk committee and project team to continuously assess the potential risks that could happen at each stage of the lifecycle of chatbots.

Planning
At the planning stage, stakeholders such as business representatives and chatbot owners need to discuss the following aspects: The overall purpose, such as customer service and internal help desk.The viewpoint, which supports the purpose of the chatbot and provides a consistent experience to the user, such as a financial service chatbot providing help to the branch customer.The tone and personality, such as a pessimist or an optimist.Proactive or/and reactive nature depending on the purpose and viewpoint.Proactive means actively taking the user toward a goal, while reactive means letting the user lead the conversations by asking questions.The name of the chatbot, which can be the first thing a user will notice in the chat box.The scenario describing brief narratives of the anticipated use of a chatbot.
The RAI risk assessment pattern needs to be applied to examine whether the purpose, viewpoint, or scenario is unethical, or whether the tone or name is offensive to a group of people.The RAI user story pattern can be used to gather RAI requirements from stakeholders on the chatbot.An example user story can be "As an Indigenous person, I want the chatbot to

AI ETHICS AND TRUST: FROM PRINCIPLES TO PRACTICE
respond to my questions in both English and Indigenous language."The RAI user stories can be used to trace the RAI requirements both backward to the stakeholders who developed the RAI user stories and forward into the design modules, code pieces, and test cases.The verifiable RAI requirement pattern can be adopted to specify the RAI requirements in a verifiable form, e.g., "all the responses must be provided with multiple language options, including the Indigenous language option."

Conversation Design
At the stage of conversation design, content writers, who are subject-matter experts, perform the following tasks to provide the best user interaction based on the understanding of user behavior: Further scenario development: The subject-matter experts need to further define the scenarios that the chatbot is expected to fulfill.Question analysis: In this step, the subjectmatter experts need to define what the key questions are for the chatbot to answer within the scenario defined.These key questions are built as intents in the chatbot later.The subjectmatter experts also need to provide questions that chatbot developers can use to train the chatbot for each key question/intent.Ideally, the training data are expected to be collected from existing data sources, such as call center logs.However, these existing data are not easy to get.Thus, the subject-matter experts need to script or create the questions.For example, super

AI ETHICS AND TRUST: FROM PRINCIPLES TO PRACTICE
business users in the business domain are grouped together and asked to think how they would ask the question as if they would ask the questions to their colleagues, or think of how they would be asked for such types of questions in their day-today job.This group of uses are then asked to write down the questions for each key questions.Content design: After defining the key questions, the subject-matter experts need to write responses to be displayed to users by the chatbot.Conversation flow design: Once the content is completed, the subject-matter experts need to design the dialogues that mimic human interaction.This step also needs to consider requirements on escalation to human agent.Interface design: Sometimes subject-matter experts contribute to the design of user interface.
The RAI risk assessment pattern needs to be adopted to check whether the defined scenarios describe any unethical narratives, whether the training data collected from existing data sources have any sensitive data or bias issue, and whether the participated subject-matter experts are diverse enough across gender, culture, race, age, expertise, and so on.The RAI user story pattern can be used to collect the nonfunctional and RAI requirements for chatbot interface design, e.g., the chatbot interface design should consider the users who are color blind.Data requirement throughout the entire lifecycle pattern is important to ensure that the data requirements are specified explicitly throughout the data lifecycle, including the model training stage, which may involve the training data collected from existing data sources or third parties.The verifiable claim pattern allows developers to verify the RAI qualities of the training data, which are associated with a verifiable claim on their RAI qualities.The diverse team pattern can be applied at the stage of conversation design.Building a diverse team of subject-matter experts can effectively eliminate bias in responses and improve the design of dialogues.The RAI training pattern can be introduced to provide subject-matter experts with knowledge and instructions on how to implement RAI in practice.The humancentered interface design for explainable AI pattern is needed to provide helpful explanations to chatbot users, e.g., informing users that AI algorithms are used to generate responses.

Implementation
After completing the conversation design, these "raw materials" are handed over to a developer to configure them into the chatbot platform, i.e., IBM Watson Assistant, in this study.The developer needs to do the following configurations.
Skill: A skill is created with the chatbot name.Intents: Intents are configured for key questions provided by the subject-matter experts.The intent is then trained with utterances (sample questions) accordingly.Entities: Entities are created to capture the context.For example, card type can be an entity class, while values can be credit card or debit card.Dialogue: A dialogue is built to capture the responses in dialogue nodes.The conversation flow is configured by the flow control mechanism, such as "jump to" and setting of "disambiguation" or "digression."Advanced features: The developer could turn on the advanced features depending on the use case and business requirements, e.g., intent recommendations, which use machine learning to uncover common topics in existing catalogs to quickly train the chatbot on the most frequent issues and questions, and auto learning, which allows business to learn from customer choices to improve the journey.Integration with downstream systems: The developer integrates the responses with downstream systems.
The RAI risk assessment pattern is needed to examine whether there is any misconfiguration by developers or machine learning algorithms, whether there is any vulnerability in data integration or storage, and whether there is any privacy or bias issue with the sample questions for training.
The customized agile process pattern can be used to adapt the agile development process by incorporating RAI principles.Extension points could be artifacts, roles, ceremonies, practices, and culture.Ethical principles could be implemented through modifying the exiting artifacts (e.g., user stories) or adding new artifacts (e.g., regulatory requirements).There is also a need to promote RAI through existing roles (e.g., product) or new roles (e.g., ethicist).Similarly, modification of the existing ceremonies (e.g., sprint planning) or introduction of new ceremonies (e.g., ethics-oriented meetings) could be considered.Practices (e.g., user acceptance testing) and culture (e.g., hiring) are two effective ways to address ethical concerns in the agile development process.
The tight coupling of AI and non-AI development pattern ensures that the development of chatbot and AI ETHICS AND TRUST: FROM PRINCIPLES TO PRACTICE other downstream systems are tightly coupled.The teams can share the same sprints and stand-up meetings, and use the common co-versioning registry to manage the artifacts and track progress.
The AI mode switcher pattern is required for the configurations made by the machine learning algorithms.All the suggested configurations need to be reviewed and approved by developers before being released into production.
Data requirements throughout the entire lifecycle pattern can be used to describe the requirements on data collection/integration and storage, taking into account all the involved roles.
The co-versioning registry pattern can be used to capture the relationships and dependencies of conversation design materials and implementation configurations.
The RAI construction with reuse pattern allows the developers to reuse the previous configurations (e.g., intents) that have passed the RAI risk assessment.

Testing
This step is relatively straightforward.Conversations are tested with various testing methods such as unit testing and regression testing.The RAI risk assessment pattern can be used to check whether there are any AI risks with the test cases.Particularly, RAI assessment for test cases is required to assess all the test cases.The RAI acceptance testing pattern is needed to determine whether the RAI requirements are met.Subjectmatter experts are involved in the testing process and perform RAI acceptance testing.

Deployment
In this step, the tested chatbot is deployed throughout the pipeline and released to production for the end user to use.Oftentimes, the final step to release to production needs to be approved by the chatbot owner or conversation owner.The RAI risk assessment pattern is needed to assess whether there is any risk when integrating the chatbot with other systems and deploying the chatbot at scale.The continuous deployment pattern can be used to implement different deployment strategies.For example, phased deployment allows the chatbot to be deployed only for a subset group of users initially.The new version of chatbot rolls out incrementally and serves alongside the old version to reduce AI risk.The standardized reporting pattern is essential for governing AI systems.Organizations can set up standardized process and templates for informing the new release of chatbot to different stakeholders, such as financial service regulators and customers.

Monitoring
In this step, the performance of chatbot is measured by multiple metrics, such as net promoter score value, feedback from user, effectiveness, and coverage of conversation.Most of the chatbot platforms provide analytic capability.Various metrics can be chosen and applied depending on business cases and needs.Also, this step ensures that the chatbot answers the right questions, answers the questions in the right way, and answers enough questions.For the proactive chatbot, it means that the chatbot needs to take the right actions.
The RAI risk assessment pattern can be used to check whether there are any missing or wrong metrics, whether there are any RAI issues in the questions from users and responses generated by chatbot, and whether there is any bias in the results of user feedback.
The RAI black-box pattern can be used to continuously record the monitored data for improvement or auditing.Various approaches and methods can be applied to analyze the monitored data.A transaction review is one of the most popular methods for data analysis.When end users have a conversation with the chatbot, the chat history can be captured and stored, e.g., by an immutable data ledger.Then, all or a subset of the chat history can be review by the subject-matter experts or auditors.
The independent oversight pattern can be applied for auditing purposes.Consistency issues need to be taken into consideration if multiple people perform the review, as they may have a different understanding.
The K-fold cross validation pattern and blind testing pattern can also be applied to evaluate the RAI performance of the chatbot.These two patterns are new patterns we identified through the case study.
Once the analysis is completed, the developers configure and build the improvements required into the chatbot.The improvements can be as simple as retraining the chatbot or modifying the responses to provide more accurate and helpful responses in a responsible manner.It can also mean building new dialogue branches or responses that are required from the end users, or even integrating with a new capability or other systems to provide more functionality for a new use case.

CONCLUSION
In this article, we discussed how we use a patternoriented, responsible engineering approach to address RAI challenges.We demonstrated the usefulness of the RAI pattern catalog in identifying and mitigating RAI risks through the chatbot development use case.

AI ETHICS AND TRUST: FROM PRINCIPLES TO PRACTICE
We are currently examining the RAI risks of our internal AI projects across different domains and recommending pattern-driven mitigations using the pattern catalog.To automate the risk assessment process, we are developing a knowledge-graph-supported tool, which can be used by different levels/types of stakeholders.The RAI knowledge graph is constructed based on the AI incidents database, our RAI pattern catalog, and question bank.

FIGURE 1 .
FIGURE 1. Overview of an RAI pattern catalog.

FIGURE 2 .
FIGURE 2. List of RAI patterns.API: application programming interface.

FIGURE 3 .
FIGURE 3. The stakeholders and stages in chatbot development.

FIGURE 4 .
FIGURE 4. Patterns for addressing risks in chatbot development.APIs: application programming interfaces.