Ethically Responsible Machine Learning in Fintech

Rapid technological developments in the last decade have contributed to using machine learning (ML) in various economic sectors. Financial institutions have embraced technology and have applied ML algorithms in trading, portfolio management, and investment advising. Large-scale automation capabilities and cost savings make the ML algorithms attractive for personal and corporate finance applications. Using ML applications in finance raises ethical issues that need to be carefully examined. We engage a group of experts in finance and ethics to evaluate the relationship between ethical principles of finance and ML. The paper compares the experts’ findings with the results obtained using natural language processing (NLP) transformer models, given their ability to capture the semantic text similarity. The results reveal that the finance principles of integrity and fairness have the most significant relationships with ML ethics. The study includes a use case with SHapley Additive exPlanations (SHAP) and Microsoft Responsible AI Widgets explainability tools for error analysis and visualization of ML models. It analyzes credit card approval data and demonstrates that the explainability tools can address ethical issues in fintech, and improve transparency, thereby increasing the overall trustworthiness of ML models. The results show that both humans and machines could err in approving credit card requests despite using their best judgment based on the available information. Hence, human-machine collaboration could contribute to improved decision-making in finance. We propose a conceptual framework for addressing ethical challenges in fintech such as bias, discrimination, differential pricing, conflict of interest, and data protection.

respondents are currently implementing AI-enabled products 46 and processes, with 77% expecting that AI will become essential to their business within two years. Similarly, accord-The rest of the paper is organized as follows. Section II 102 systematizes a set of ethical principles relevant to finance, 103 namely integrity, objectivity, competence, fairness, confiden-104 tiality, and diligence, and discusses their fundamental impor- 105 tance to the financial services industry. Section III presents 106 the principles and goals of explainable machine learning. 107 Following the discussion in Sections II and III, we proceed 108 by mapping the relationship between finance and ML ethics 109 in Section IV. We conduct an experiment with a group of 110 experts in finance and ethics to manually annotate the map- 111 ping between the principles of finance and ML ethics. The 112 results are compared with mappings performed using NLP 113 transformers, which show an overlap with the expert anno- 114 tations. The explanation of NLP methods is comprehensive 115 to be accessible to the wider audience. The results show that 116 integrity and fairness exhibit the strongest relationships with 117 ML ethics. Section V focuses on the ethical problems of 118 machine learning in fintech. We treat topics such as biased 119 data, accuracy, transparency, discrimination, differential pric- 120 ing, manipulated recommendations, conflict of interest, vio-121 lations of code of conduct, insider trading, data protection, 122 and lack of skilled staff and discuss their potential con- 123 sequences. Section VI explains the state-of-the-art (SOTA) 124 tools that are used for explainable ML. Section VII focuses 125 on a use case scenario where an ML model is used for credit 126 card approval. We show not only how the proposed tools can 127 help understand the ML decision in a finance context but 128 also that both humans and machines could make mistakes in 129 approving credit card requests, thereby emphasizing the need 130 for a human-machine collaborating to improve the decision-131 making process. In Section VIII, we propose a conceptual 132 framework for addressing ethical challenges in fintech such 133 as bias, discrimination, differential pricing, conflict of inter-134 est, and data protection. Section IX concludes the paper. 136 In this section, we review the traditional core principles of 137 ethics in finance. Based on analysis of 11 financial services 138 professional associations, the study in [26] has distilled seven 139 basic principles found in their codes of conduct: integrity, 140 objectivity, competence, fairness, confidentiality, profession-141 alism and diligence as described in Table 1. allocating fair returns to everyone [26]. With regards to Confidentiality is the obligation to hold client information 190 in confidence. When seeking financial advice, clients may 191 share sensitive information about their finances and finan-192 cial goals such as family dynamics. Financial services pro-193 fessionals should not divulge personal information due to 194 the relationship of trust. There are four reasons that show 195 the need for confidentiality: personal autonomy, respect for 196 relationship obligations, client vulnerability, and serving the 197 common good [27]. Personal autonomy acknowledges that 198 clients have jurisdiction over their own personal information, 199 and it is important that professionals respect the obligations 200 arising from trust relationships. Trust and intimacy are built 201 through sharing of personal information. Confidentiality is 202 needed as clients become vulnerable by sharing personal 203 information. Professionals are obliged to act in the best 204 interests of their clients. Finally, as noted in [27], a system 205 that respects confidentiality works for the public interest 206 as well.

208
The principle of professionalism has three requirements: 209 treatment based on respect and consideration, duty of pro-210 fessionals to maintain their reputation, and improving the 211 quality of service provided to the public [26]. Regarding the 212 first requirement, professionals should not treat clients as 213 mere means to achieve their own goals as such treatment 214 hampers clients' autonomy. Treating clients with courtesy 215 and respect is the basis for protecting the interests of clients 216 and also for establishing trust. The second requirement is 217 needed because the success of the financial services industry 218 is grounded in the public trust. Without trust, it is much more 219 difficult to establish confidence between professionals and 220 clients. Finally, assisting clients with making better financial 221 VOLUME 10, 2022 decisions contributes not only to their financial security but 1 The OECD is an intergovernmental organization with 38 member countries founded in 1961 to stimulate economic progress and world trade. The majority of OECD members are high-income economies with a very high Human Development Index (HDI), comprising 62% of the global nominal GDP ($49.6 trillion) [28]. The OECD is an official United Nations observer. Together with governments, policy makers and citizens, the OECD works on establishing evidence-based international standards and finding solutions to a range of social, economic and environmental challenges. A significant part of the OECD activities focuses on defining public policies and international standards [29]. fairness; iii) transparency and explainability; iv) robustness, 267 security and safety; and v) accountability [31]. 268 The second section proposes specific steps to governments 269 to implement national policies and international cooperation 270 aligned with the five principles. This includes i) investing in 271 AI research and development; ii) fostering a digital ecosystem 272 for AI; iii) shaping an enabling policy environment for AI; 273 iv) building human capacity and preparing for labour market 274 transformation; and v) international co-operation for trust-275 worthy AI [31].

276
In this section, we investigate how the OECD principles 277 are mapped to the previously discussed ethical principles in 278 finance. The purpose is to use this mapping to qualitatively 279 evaluate the relationship between the goals of explainable 280 machine learning and the ethical challenges in fintech from 281 an ML perspective.  286 This principle states that AI should be developed and used 287 to increase prosperity for all -individuals, society, and the 288 planet. It recognizes the potential of AI to advance inclu-289 sive growth and sustainable development in areas such as 290 education, health, transport, agriculture, environment, etc. 291 Stewardship of trustworthy AI should be accompanied by 292 addressing inequality, risk of divides due to disparities in 293 technology access, and biases that may negatively impact 294 vulnerable or underrepresented populations.

295
Human-centred values and fairness. Based on this princi-296 ple, AI should be developed consistent with human-centred 297 values, such as fundamental freedoms, equality, fairness, rule 298 of law, social justice, data protection and privacy, as well 299 as consumer rights and commercial fairness. The principle 300 recognizes that certain AI applications may have negative 301 implications such as deliberate or accidental infringement 302 of human rights and human-centered values. Therefore, the 303 development of AI systems should be aligned with these 304 values including the possibility for humans to intervene and 305 oversee such systems.

306
Transparency and explainability. Transparency defined in 307 this principle has two aspects. The first one is to disclose 308 if AI is being used in an application so that users are 309 aware of it. The second is to enable people to understand 310 how an AI system works so that they can make informed 311 choices. Explainability means enabling people affected by 312 the outcome of an AI system to understand the system's 313 decision. To achieve explainability, the system should provide 314 easy-to-understand information to people affected by an AI 315 system's outcome so that they can challenge the outcome, 316 if needed. An explanation may involve providing details on 317 the determinant factors behind a specific outcome or decision, 318 or explaining why similar circumstances generated a different 319 outcome.  [54], [55], [56], [57]. 379 Accessibility. Accessibility facilitates the involvement of 380 end users in the process of developing, improving and mon-381 itoring ML models. Accessibility will ease the burden of 382 non-technical or non-expert users when using AI systems and 383 algorithms seemingly incomprehensible at first sight [ [58]. 385 Interactivity. Interactivity allows end users to assess and 386 test explainable ML models. Interactivity can also serve as a 387 tool for improving AI explainable models. This is relevant to 388 fields in which end users need to have ability to interact with 389 the models and to modify them [48], [58], [59], [60], [61]. 390 Privacy awareness. The ability to assess privacy is one of 391 the byproducts enabled by model explainability. ML models 392 may have complex inner-workings, and not knowing how 393 the model's results are represented internally may lead to 394 a privacy breach. In addition, explaining the inner-relations 395 of a trained model to non-authorized third parties may also 396 compromise privacy [62].

397
To harness the potential of the novel approaches to ML 398 ethics, we explore the correspondence between ML ethics and 399 traditional principles of finance ethics.

402
The previous two sections make a broad overview of the 403 principles of finance and ML ethics. While finance ethics 404 is well established, ML ethics has witnessed an increased 405 interest only recently due to the proliferation of ML-based 406 solutions in finance. The contribution of this paper is in 407 studying the relationship between finance and ML ethics 408 with the goal of minimizing the adverse impact of ethical 409 issues in fintech. The purpose of this study is to identify the 410 most important criteria to consider when addressing ethical 411 challenges in ML-based fintech applications. The results can 412 help fintech companies in building products and services by 413 considering the most relevant ethics principles.

415
To evaluate the relationship between finance and ML ethics, 416 we conducted an experiment with a group of experts in 417 finance and ethics to manually annotate the links between the 418 ethics principles based on their definitions.

419
The group is composed of 8 experts from the academic 420 community with expertise in finance and ethics who are 421 also knowledgeable in machine learning. They are chosen 422 carefully to ensure they tackle effectively the task of manually 423 annotating the links between the ethics principles. Each of 424 the experts received both the long and short definitions of 425 ML ethics and finance ethics to assess the mapping between 426 the principles. Each of the experts worked individually on the 427 mapping. After the process was completed, the results were 428 VOLUME 10, 2022      In order to make the actual comparison, the human anno- nizations and institutions as explained in Section II [26].  To enhance objectivity and improve decision making, 458 we have assessed the experts' results using recent advance-459 ments in natural language processing (NLP) that led to sub-460 stantial improvement in certain tasks, almost comparable 461 with human performance. 2 One such task is semantic text 462 similarity where SOTA results are obtained using NLP trans-463 formers. The columns denoted as LD and SD on   wide range of tasks such as machine translation [66], 484 [67], [68], question answering [69], [70], [71], [72], sen-485 timent analysis [73], [74], [75] The main essence of transformers is that they can encode 490 any text into a vector representation that can be then fed into a 491 machine learning model for further analysis. One such appli-492 cation is assessing the semantic similarity between two texts 493 such as two sentences or two paragraphs. We use the cosine  The dataset (P f , P ML ) for our experiment consists of two  The calculations are repeated for both parts of the dataset. For 506 determining the strength of the links, i.e. whether they reveal 507 weak, moderate or strong relationship, we use the following 508 approach. For each transformer, we calculate the 33.33% and 509 66.66% percentiles obtained from the set of cosine similari-510 ties for all pairs of principles for that transformer. Then, for 511 each pair of principles, we check if the cosine similarity for 512 that pair is less than the 33.33% percentile, less than 66.66% 513 percentile, or higher than the 66.66% percentile. Depending 514 on the comparison with these thresholds, the link for that 515 pair is labeled as weak, moderate or strong, respectively. The 516 reason for analyzing both the long and short definitions is to 517 get insights into the links between the principles from two 518 related perspectives with the goal of assessing the level of 519 overlap between the two sets of results.

520
For the experiment, we used the NLI-DistilRoBERTa-521 Base-v2 model from Hugging Face [64], [86]. RoBERTa 522 is chosen as it showed superior performance within the 523 transformers analyzed in [76] on sentiment tasks in finance. 524 Fig. A.2 presents the results obtained from the transformer 525 experiment for both long and short definitions, and demon-526 strates overall alignment with the manually annotated map-527 pings. In the following subsection, we discuss the overall 528 insights obtained from the mappings. have an impact on financial advice. Therefore, it is important 586 to define impartial criteria to identify relevant data sources.

587
The difficulties do not end here though. In fact, they are 588 further intensified knowing that the collected data itself may 589 be biased. Since ML models are typically designed to operate 590 autonomously, it is difficult to check for bias unless data 591 is verified manually by human intervention. However, this 592 is not only a labor-intensive task but also practically infea-593 sible given the potentially huge volume of data that needs 594 to be checked. Collaboration with subject matter experts is 595 essential when developing data and methods to avoid con-

601
Investors carefully select whom to trust with their invest-602 ment decisions. There is a distinction between a broker-603 age and an investment advisory firm. Brokers engage in 604 the business of effecting transactions in securities for the 605 account of others, for which they receive compensation. 606 When brokers recommend securities to their clients, they 607 must ensure that the investment is ''suitable'' for the client. 608 On the other hand, investment advisors advise others about 609 investing in securities and receive compensation for the 610 advice. When investment advisers recommend an invest-611 ment to their clients, the investment needs to be in ''the 612 best interest'' of the client. These differences are essen-613 tial and create two different standards of conduct: i) suit-614 ability for brokers and ii) fiduciary (''best interest of the 615 customer'') for investment advisers. Investors should know 616 the difference, especially when the investment advice is 617 selected based on an ML algorithm. It is challenging to 618 understand whether the AI-based investment decision is made 619 because it is ''suitable'' or in ''the best interest'' of the 620 client. These questions are at the center of the Securities and 621 Exchange Commission (SEC) regulatory discussion about 622 the distinction between best interest and fiduciary duty and 623 should be considered when developing ML-based investment 624 algorithms [91]. 625 If we assume that an ML model is fed with comprehen-626 sive and unbiased data collected from relevant and reliable 627 sources, it may still be challenging to select an existing model 628 or develop a new model having a level of accuracy that 629 is sufficient for solving a particular problem at hand given 630 specific client circumstances. One such example is predicting 631 stock returns for optimizing investment decisions. Existing 632 models may not entirely correspond to the requirements 633 needed for tackling the problem domain and may have to 634 be adjusted. Prior to improving these models, there may not 635 be a clear indication that the adjustments would work well. 636 Similarly, new models may need to be created to achieve an 637 adequate fit with the problem domain. Modeling phenomena 638 involving human beings often requires simplification that 639 can mask risks for the sake of precision; over-reliance on 640 probability and statistics can be a limiting factor as economics 641 is a social and not a natural science [92]. Looking at this 642 problem pragmatically, it is not only unclear whether the 643 new models would accomplish better performance; devel-644 opment of such models may necessitate significant efforts 645 that could delay rendering financial services with acceptable 646 level of diligence, i.e. in a prompt and thorough manner. 647 Consequently, the financial professional dilemma is deciding 648 which model to choose and how to present its accuracy to 649 the clients.   the rendered financial service [26]. Differential pricing in 689 itself is not necessarily an ethical violation; for example, 690 clients could be charged more if they are more demand-691 ing and thus require more effort by the provider of finan-692 cial services. Similarly, low maintenance clients would be 693 charged less since providing services for them is not as 694 complex as in the case of high maintenance clients. If finan-695 cial services professionals have more experience and edu-696 cation, this would constitute a fair component in defining 697 the pricing structure of their services. However, factors that 698 do not define the value proposition may violate ethical 699 norms.

700
A fundamental premise that conditions clients' confidence 701 in the advice they receive by ML models is the promise 702 that models are designed ethically to maintain objectivity. 703 ML models should be based on harnessing evidence avail-704 able in collected data and applying state-of-the-art algo-705 rithms to ensure consistent and equitable treatment of clients. 706 An objective model does not deliberately promote a set 707 of input data at the expense of other input data. However, 708 a model can be intentionally tailored to recommend a set of 709 actions to specific clients without evidence that these recom-710 mendations are justified. A problem can arise if the model 711 is injected with the ''right'' amount of bias to demonstrate 712 that it has ''superior'' results, which would ultimately mislead 713 clients [90].   is not sufficient to guarantee good ethical behavior [97]. 740 If one possesses information about illegal or unethical 741 actions by peers or other actors in the finance industry, 742 it is their responsibility to report these activities up the 743 hierarchy in the organizations and the respective regula-744 tory bodies. Otherwise, financial professionals are subject 745 to legal repercussions. However, this is not always pos-746 sible as such reporting can irreversibly damage the rela-747 tionship between the involved parties, so people tend not 748 to report their peers or colleagues. The development of 749 ML models without transparency and quality control could 750 further contribute to unethical behavior because it will be 751 impossible for companies to evaluate the objectivity of the 752 ML models.

753
Organizations such as corporations, associations and insti-754 tutions often develop codes of conduct to guide the behav-755 ior of their members. Violations of codes of conduct in 756 the financial industry leads to improper dealings that harm 757 investor interests and the market stability as a whole. There 758 exist organizations that oversee financial experts' behavior to 759 verify it is in accordance with established laws. For instance, 760 in the U.S., the official regulatory agency that implements 761 securities laws and establishes regulations for proper conduct 762 is the SEC [87]. With the technological advancements and 763 investment model implementations in the financial indus-764 try, it is becoming increasingly difficult to monitor compli-765 ance of ML algorithms with established regulations. This 766 challenge could be overcome with appropriate training for 767 regulators to be more technologically-savvy and compe-768 tent in detecting violations of code of conduct induced by 769 ML systems. Insider trading is an unfair market practice in which market 773 participants trade based on material non-public information 774 to generate extraordinary gain [87]. Material information 775 is defined as information that will change the investment 776 behavior of a rational investor when they obtain the infor-777 mation. Hence, this information will affect the stock price 778 behavior. Insider trading is harmful because it undermines 779 the trust of investors in the financial markets, and cre-780 ates an unfair trading environment. Given the damage that 781 insider trading can create, the SEC prosecutes insider trading 782 violations as one of its enforcement priorities. Deterring 783 people from exploiting insider trading includes disqual-784 ification from acting in certain fiduciary positions for 785 life or limited period of time, money fines, and prison 786 sentences [98]. in the implementation of AI was the readiness and ability of 810 staff to understand and work with these new solutions [99].  [101], [102]. Deploying ML models in practical applica-849 tions has to be accompanied by a rigorous performance 850 evaluation [103].

851
The model fairness problem is often related to selecting 852 the right metric for benchmarking. In many cases, the bench-853 mark is merely based on a single aggregate metric, such as 854 accuracy, for the entire dataset [103]. However, this makes it 855 difficult to understand how an ML model performs on various 856 dataset partitions. The issue with such a single-valued metric 857 is that even though most of the partitions of the input data may 858 perform well by meeting the required benchmark, there could 859 still exist non-negligible regions of data for which the model's 860 predictions may render considerably different results. While 861 the ML model may perform satisfactorily when averaged over 862 the entire dataset, the discrepancies for certain regions can 863 lead to ethical issues such as bias, inaccuracy, unfairness, 864 discrimination, etc. Furthermore, using an aggregate metric 865 makes it difficult to continuously monitor the model behavior 866 when new data is collected.

867
To address this problem, the dataset is sliced into a 868 one-dimensional or two-dimensional grid of input features 869 and each cell of the grid is separately evaluated against 870 the selected metric. For example, if the metric is the error 871 rate, by analyzing the grid it is easy to visualize how the 872 errors are distributed across various parts of the dataset. 873 The visualization can be aided by heatmaps to color cells, 874 e.g. using a darker color, if they exhibit higher inaccu-875 racies [103]. This visualization technique emphasizes the 876 problematic regions of data that suffer from model incon-877 sistency and are difficult to evaluate by using an aggregate, 878 single-number metric. Thus, by offering a deeper view of 879 the model behavior, two goals are achieved: (i) ability to 880 visually identify performance problems and (ii) gaining bet-881 ter insights, useful for performing model debugging. Both 882 goals improve the interpretability of ML models and their 883 responsible use. Tool (WIT) [114], [115].

925
The above tools focus on explaining ML models in the  potential success of such models, this implies that an ordinary 958 user -not trained in the complex theory related to the area 959 being modeled -could not understand nor interpret such 960 a theory correctly and, hence, from their perspective, the 961 results from the explainable ML would appear opaque and 962 completely incomprehensible [119]. Second, the explainable 963 models depend on the AI system and the available data 964 that, in some cases, can be imperfect or limited. In this 965 case, it is evident that the knowledge derived from such 966 a system is further restricted in terms of the answers that 967 can be offered and the number of explanations that can be 968 provided [119].

969
Reference [120] developed a method to compute small 970 adversarial perturbations (equivalent to creating adversarial 971 instances for neural networks) that resulted in significant 972 modifications to the feature importance of several explain-973 able methods. Similar findings for the SHAP method were 974 presented by [121]. However, in the case of real-world appli-975 cations such as healthcare or finance, such adversarial per-976 turbations and their influence on explainable ML approaches 977 need to be further investigated [118]. The results from [122] 978 suggest that the mathematical correctness that underpins 979 SHAP is not sufficient alone; it should also be aligned with 980 the specific use case and the human-centric understanding of 981 SHAP's quality of explanations.

982
One of the possible solutions to some of the mentioned 983 challenges is that the most powerful but opaque ML systems 984 (e.g. deep learning) should not be preferred and applied by 985 default, but a comparable alternative that is less powerful 986 but inherently explainable can be employed. In general, the 987 inherently explainable ML models should be adopted because 988 of their transparency and explainability, while black-box 989 models with model-agnostic explainability can be more dif-990 ficult to defend under regulatory scrutiny [119]. Based on 991 this, ML practitioners and financial professionals should be 992 aware that responsible ML is applicable in settings where 993 it can admit clear interpretation. This argument is also in 994 line with the overarching understanding of the proposed eth-995 ical framework that human-machine collaboration is essen-996 tial for addressing model explainability and transparency 997 in general.

1000
In the previous section, we presented tools for responsible 1001 ML that can address ethical challenges. Here we demonstrate 1002 a use case showing how to respect ethical principles while 1003 applying tools for responsible ML to address the challenges 1004 arising in fintech.

1006
In this section, we consider a fintech application for approv-1007 ing credit card requests based on machine learning predic-1008 tions. We use the Credit Approval Dataset from the UCI 1009 Machine Learning Repository [123]. The dataset contains 1010  As a preparation for our experiments, 3 we perform stan-1025 dard data processing steps. First, we audit the dataset for 1026 missing values and find that less that 37 cases (5%) have 1027 one or more missing values. We impute the missing data 1028 with the mean value for each numerical feature. We replace 1029 missing categorical data with the most frequent value for each 1030 categorical feature. We then use a label encoder to convert 1031 all categorical values into numerical types, after which we 1032 remove the Drivers License and ZipCode features from the 1033 dataset as they are unlikely to have tangible impact on the 1034 predictive performance. Finally, we split the dataset into 1035 3 The code and dataset for the explainability analysis can be found at: https://github.com/rizinski/Ethics-in-Finance-and-Machine-Learning/tree/main/explainability_notebook VOLUME 10, 2022 FIGURE 6. Error heatmap for identifying regions in the dataset with higher errors. Age is on the horizontal axis and Prior Default is on the vertical axis, with 1 meaning that there was a prior default, and 0 meaning that there was no prior default. XGBoost's decisions are hard to interpret, which makes it 1053 suitable for our explainability analysis from a fintech per-1054 spective. Before selecting XGBoost, we compared it with another popular algorithm, Light Gradient Boosting Machine 1056 (LGBM), which resulted in a similar accuracy score on the 1057 same dataset. As part of this study, we also considered other 1058 approaches to create predictive models, including deep neural 1059 networks. However, our intention is not to present a compre-1060 hensive survey of all the possible models but rather to select 1061 an illustrative modeling example that can give helpful direc-1062 tion on applying ML explainability in the financial industry 1063 while considering ethics issues.

1065
After training the XGBoost model on the dataset, we pro-1066 ceed with explainability analysis of the model using SHAP. 1067 The results are summarized in Figures 3-4. Both plots display 1068 the feature importance of the dataset, i.e. how each feature 1069 of the dataset impacts the model's output. As explained 1070 in the SHAP documentation, a single dot on each feature 1071 row in the beeswarm plot represents an explanation for a 1072 given instance of the dataset. The horizontal position of 1073 the dot is determined by the SHAP value of that feature, 1074 while dots are accumulated along each feature row to show 1075 density. Color is used to display the original value of a 1076 feature.

1077
In Fig. 3 we observe that on average the feature Prior 1078 Default has dominant impact on decisions for approving 1079 or rejecting credit card requests. Customers without prior 1080 default are generally favored by the model, while customers 1081 who have defaulted on their credit card payments are gen-1082 erally disfavored. Another insight is that low income appli-1083 cants may still be issued credit cards provided that they 1084 had no prior default. On the other hand, applicants with 1085 prior default are less likely to get an approval for the 1086 same income level, meaning that having a prior default is 1087 a very strong indicator for rejecting credit card requests. 1088 Prior Default is followed by Income and Debt as important 1089 features, while employment-related factors as well as age 1090 and education are less relevant when considering credit card 1091 applications.

1092
The beeswarm plot in Fig. 3 shows the density distribution 1093 across all instances of the dataset. Fig. 4 shows a bar plot 1094 obtained from SHAP on the same dataset for the XGBoost 1095 model, showing the global feature importance for the overall 1096 model, defined as the mean absolute value for that feature 1097 across all samples.

1098
The SHAP explainer is also able to create bar plots to 1099 describe local feature importance of individual instances of 1100 the dataset. Fig. 5 represents bar plots for local explainability 1101 of four instances, where each feature is represented by its 1102 SHAP value. Fig. 5a shows how a typical plot looks like for 1103 an instance of the dataset where both the human decision 1104 and XGBoost prediction approved the credit card request. 1105 We notice that most of the features exhibit SHAP values 1106 that contribute strongly in favor of the approval. This is 1107 not a surprise. The selected applicant did not have a prior 1108 default, has income, is currently employed, and has been 1109 employed for 20 years. The education level and age contribute 1110 negatively, but their impact on the final decision is negligible. An interesting fact is that ethnicity has a slight nega-1112 tive impact even though this contributes only insignificantly.  Fig. 5b is similar to Fig. 5a, but in the other direction.   as education level, small debt, current employment, and not 1167 having a prior default, which ultimately led the model to an 1168 approval. At first sight, for this particular applicant, it seems 1169 like the model decided correctly, while the bank was biased. 1170 However, there is no strict indication whether this is actually 1171 true. One may also say the bank made the right decision, 1172 while the model was biased.

1173
The examples on Fig. 5c and Fig. 5d show that both 1174 humans and machines could potentially make mistakes 1175 despite using their best judgement based on the available 1176 information. Therefore, it should be emphasized that the 1177 human-machine collaboration is important for making bet-1178 ter decisions. When a machine is involved in the decision-1179 making process, it makes the decisions more transparent. 1180 By involving ML models, human experts are able to return 1181 back to the applicant's case to make an additional review. This 1182 process could result in better decisions and reduce bias and 1183 mistakes.

1185
The Responsible AI Widgets developed by Microsoft pro-1186 vide a convenient visual dashboard for identifying cohorts 1187 of data that exhibit a higher error rate compared to over-1188 all (benchmark) error rate for the entire dataset. The error 1189 analysis in the dashboard can be performed by using two 1190 types of diagrams: i) error heatmaps obtained by selecting 1191 one or two features or ii) a binary decision tree that partitions 1192 the dataset into subgroups for discovering dominant error 1193 patterns. 1194 Fig. 6 shows an error heatmap for two input features: Prior 1195 Default and Age. While various combinations of features can 1196 be selected, we wanted to see how errors are distributed for 1197 different age groups based on having a prior default or not, 1198 given that prior default is the most important feature for the 1199 dataset. As a two-dimensional grid, the heatmap partitions 1200 the dataset into different regions and visualizes how errors 1201 are distributed across the regions for these two features. The 1202 cells with higher errors are visualized with a darker red 1203 color, denoting a higher error disparity with the benchmark 1204 error rate. The analysis of the heatmap view depends on 1205 the understanding how feature importance may affect failure. 1206 The benchmark error rate for the dataset is 12.56%, but the 1207 heatmap reveals that some regions exhibit higher error rates 1208 than others. Credit card applicants of age 38.6 years or less 1209 and age above 53.9 are likely to suffer from model failures. 1210 However, applicants in the age range of 38.6-53.9 with prior 1211 default are more vulnerable than applicants who had no prior 1212 default.

1213
The dashboard provides a data explorer which can be 1214 used to further analyze the cohort and uncover parts that 1215 are underrepresented. Fig. 7 shows how data is distributed 1216 across the feature Age. We notice an imbalance between the 1217 rejected applicants (marked blue) and approved applicants 1218 (marked orange) by the model. Most of the data is concen-1219 trated for applicants of age 46.3 or less, thereby explaining 1220 why this cohort is more susceptible to model errors. Since 1221 VOLUME 10, 2022 FIGURE 8. Decision tree approach for discovering patterns of data instances that are most prone to errors. We observe that the model makes more errors in credit card approvals for the groups of clients with income ≤ 125 who have had no prior defaults, and who are unemployed.  income and unemployment, which makes the model more 1248 prone to erroneous decisions. For such applicants, the model 1249 may not render adequate predictions, meaning that a human 1250 expert may need to intervene to make the ultimate deci-1251 sion. A human may need to assess the situation by looking 1252 closely into other available factors and circumstances con-1253 cerning the applicant in order to minimize the likelihood of 1254 failures.

1255
It is also possible to perform a what-if analysis. The dash-1256 board on Fig. 9 provides a way to select an instance, change 1257 values for some of the features, and see what would happen 1258 with the results once the values are changed. Going back to 1259 Fig. 5c, we can use the what-if tool to verify our conclusion 1260 that the model made a mistake not to approve the request for 1261 the applicant with index 101. We can see that if PriorDefault 1262 is changed to 0 (i.e. no prior default), while income is set to 0, 1263 then the model approves the applicant's request, as shown in 1264 Fig. 9, moving the blue square for instance 101 (declined) to 1265 the red star position (approved). Hence, the model would have 1266 approved the request if there was no prior default even if the 1267 applicant had no income. This result confirms that the model 1268 prediction is strongly biased by the Prior Default feature for 1269 this applicant. Conversely, the bank made the right decision 1270 to approve the request despite the prior default given that the 1271 income for the applicant is high (this is the applicant with 1272 highest income in the dataset).  Moreover, such tools automate the process, meaning that 1298 they can handle large volumes of data in a consolidated way. 1299 Another benefit is that they provide visualization dashboards 1300 that are easy to use and can help even the non-specialist to 1301 gauge the quality of data, its sufficiency and unbiasedness, 1302 thereby minimizing the need to consult with subject matter 1303 experts.

1304
Concerning accuracy, the requirements consist of identi-1305 fying the need to adjust or improve existing models, and 1306 verify whether better models are required for the particular 1307 problem. In both cases, the goal is to achieve accuracy that is 1308 commensurate with the problem domain at hand. Using the 1309 proposed tools, we can examine the entire dataset and obtain 1310 separate accuracies for different data regions rather than using 1311 an overall single-valued accuracy, which is an insufficient 1312 measure for accuracy in many use cases. The benefit of this 1313 approach is twofold. Firstly, it provides information whether 1314 a model performs well and demonstrates no discrepancies 1315 for various partitions of the data. In case the model does 1316 not perform satisfactorily, this approach gives insights into 1317 the problematic partitions, the reasons they fail, and offers 1318 a solution for adjusting or improving the existing model to 1319 meet the benchmark performance for the problematic regions. 1320 By observing the multi-dimensional aspects of accuracy, 1321 developers could obtain insights into the initial hypotheses 1322 to help them understand the most important features that 1323 contribute to the failures. Secondly, it will help developers 1324 compare performance among different ML models to identify 1325 the most suitable model. As a result, the analysis will ensure 1326 that the accuracy achieved is consistent across the dataset 1327 and that it is appropriate to the problem domain tackled 1328 by a given fintech service. This will enable faster due dili-1329 gence of ML models, reduced efforts for model testing and 1330 evaluation, and increased customer confidence in the model 1331 optimality.

1332
Finally, the problem with the lack of transparency could be 1333 effectively solved as a side effect of applying responsible ML 1334 tools since model transparency and interpretability are main 1335 priorities. Even when models with complex inner-workings 1336 VOLUME 10, 2022 are used such as deep learning models, an ML system can 1337 be easily and transparently gauged for precision with such 1338 a consolidated error analysis, thereby assessing fintech ser-1339 vices and financial risks in a way that is methodologically 1340 sound. Furthermore, such approach will make the process 1341 transparent not only for finance professionals when rendering 1342 services to customers but also for financial regulators. This to verify the existence of differential pricing, the visualiza-1389 tion capabilities of the responsible ML tools can serve as 1390 a safeguard against unfair pricing practices that may cause 1391 ethical consequences. Such methodology will enable legis-1392 lators and regulators to audit pricing policies by examining 1393 ML models with visualization toolkits. In addition, financial 1394 services providers can keep logs of historical pricing for 1395 proving the consistency of pricing strategies with ethical 1396 requirements.

1397
Along with bias, manipulated recommendations represent 1398 a related ethical problem that negatively impacts ML-based 1399 financial services. Manipulated recommendations refer to 1400 practices of tailoring ML models intentionally by purpose-1401 fully injecting bias, hampering the objectivity of the model. 1402 Such bias hampers the objectivity of the model. For example, 1403 if the model is not objective, preference may be given to 1404 certain cohorts of input data at the expense of others. This can 1405 lead to giving biased advise to customers without sufficient 1406 evidence whether the actions are justified, possibly leading 1407 to financial loss. To prevent such consequences, it must be 1408 clear what dataset is used when making financial decisions 1409 to ensure that no specific subset of input data is preferred. 1410 In addition, as the dataset expands throughout time, dataset 1411 versioning is another important prerequisite for auditing pur-1412 poses. Error analysis across various features of the dataset 1413 can identify problematic data regions that can contribute 1414 to inaccurate predictions. Transparently presenting the data 1415 sources and dataset itself together with the use of these tools 1416 can help recognize whether certain regions of the input data 1417 exhibit deviations from the rest of the data. This ensures 1418 interpretability of the justifications for the recommendations 1419 when rendering a specific financial service. Disclosing conflict of interest for financial services profes-1424 sionals means not only giving honest and accurate recom-1425 mendations to their clients but also disclosing any interest 1426 that may contradict the interests of the clients. While there 1427 is a personal element involved in such dealings on the part 1428 of the financial services professionals, fintech services based 1429 on ML have the potential of making the advising process 1430 transparent. Performing error analysis of various regions of 1431 data and checking when and how certain regions fail against 1432 the chosen metrics, clients have an opportunity to verify if the 1433 analysis evidence matches the recommendations given by the 1434 fintech professional. The benefits are mutual for both sides 1435 as it increases trust and reduces risks related to conflict of 1436 interest.

1437
Reporting unethical activities is another concern that is 1438 streamlined with responsible ML approaches. Responsible 1439 ML helps discover and report deficiencies in ML models 1440 that can lead to unethical behavior. Using error analysis, the 1441 responsibility for reporting unethical activities is no longer 1442 constrained only to an individual, but rather expands to ML 1443 teams that deploy models for use in fintech services. Having 1444 involved multiple professionals to work on the ML models 1445 reduces to a larger degree the risks associated with unethical in the emerging field of fintech [20]. Personal data can be 1491 further protected using modern techniques such as differential

1494
The explainability and interpretability of ML models 1495 will significantly aid specialists in the financial industry.

1496
ML experts are needed to develop the models and collabora-1497 tively work with finance specialists to help them analyze and 1498 use the results obtained by the models. In essence, financial 1499 services professionals will not need to know the sophisticated 1500 innerworkings of the ML models; they will have the correct 1501 error analysis and visualization toolset available to interpret 1502 results of ML models to take appropriate actions. Once the 1503 needed infrastructure for ML model analysis is in place, it will 1504 be very helpful for financial services companies to utilize 1505 the models for sophisticated financial analysis. As a result, 1506 the firms in the financial services industry will significantly 1507 increase the overall readiness and ability of their workforce 1508 to embrace machine learning.

1510
Machine learning is revolutionizing many economic sectors, 1511 including finance. Several global surveys with financial insti-1512 tutions reveal ample evidence that ML is poised to become 1513 the backbone of the financial industry in the near future. 1514 ML algorithms could enhance financial services and clients' 1515 value by harnessing the potential of large-scale automation 1516 that leads to significant cost savings. Despite the predicted 1517 positive impact for businesses, there is a range of ethical 1518 challenges in fintech that affect not only customers of fintech 1519 services but also financial institutions. To address these chal-1520 lenges, we performed mapping between ethical principles of 1521 finance and ethical principles of ML to reveal which tradi-1522 tional finance ethical principles have the most substantial cor-1523 respondence to ML principles. The mapping outcome shows 1524 that traditional finance principles of integrity and fairness 1525 have the most significant overlap across ML ethics principles. 1526 Additionally, the ML ethics principles of human-centered 1527 values and fairness, as well as transparency and explainabil-1528 ity, show the most considerable overlap with the traditional 1529 finance ethics principles. We study the correspondence of the 1530 conventional finance and ML ethics principles to merge the 1531 advantages of ML-based decision-making, such as cost and 1532 time savings, with the traditional finance decision-making 1533 based only on human-based criteria. This result confirms the 1534 importance of integrity and fairness as essential principles in 1535 finance ethics.

1536
The paper presents a conceptual framework to identify 1537 and address challenges in financial decision-making such 1538 as bias, discrimination, differential pricing, conflict of inter-1539 est, or data protection. The main objective of the mapping 1540 between finance and ML ethics is to identify the most critical 1541 criteria for handling ethical challenges in ML-based fintech 1542 applications. We rely on experts' opinions to evaluate the 1543 mapping between finance and ML ethics to assess these 1544 relationships. The application of the proposed framework is 1545 presented through a practical use case of creating an ML 1546 model for approving credit card requests. We showed how 1547 to develop a predictive model using state-of-the-art ML algo-1548 rithms and explainable ML tools like SHAP and Microsoft 1549 Responsible AI Widgets. The application of explainability 1550 methods enhances model transparency and helps diagnose if 1551 models used in fintech settings suffer from inconsistencies 1552 that can cause ethical issues. Finally, we present a conceptual 1553 framework for using this approach to solve ethical challenges 1554 in ML applications for fintech.