Toward Accountable and Explainable Artificial Intelligence Part One: Theory and Examples

Like other Artificial Intelligence (AI) systems, Machine Learning (ML) applications cannot explain decisions, are marred with training-caused biases, and suffer from algorithmic limitations. Their eXplainable Artificial Intelligence (XAI) capabilities are typically measured in a two-dimensional space of explainability and accuracy ignoring the accountability aspects. During system evaluations, measures of comprehensibility, predictive accuracy and accountability remain inseparable. We propose an Accountable eXplainable Artificial Intelligence (AXAI) capability framework for facilitating separation and measurement of predictive accuracy, comprehensibility and accountability. The proposed framework, in its current form, allows assessing embedded levels of AXAI for delineating ML systems in a three-dimensional space. The AXAI framework quantifies comprehensibility in terms of the readiness of users to apply the acquired knowledge and assesses predictive accuracy in terms of the ratio of test and training data, training data size and the number of false-positive inferences. For establishing a chain of responsibility, accountability is measured in terms of the inspectability of input cues, data being processed and the output information. We demonstrate applying the framework for assessing the AXAI capabilities of three ML systems. The reported work provides bases for building AXAI capability frameworks for other genres of AI systems.


Intelligence (XAI) frameworks algorithm-centric, neglect-
[7]. From practitioners' perspectives, these gaps result in 28 (a) no or little utility of the system explainability features and 29 (b) users' inability to interpret the given reasoning. Such gaps 30 inhibit automation of tedious practices and impede adoption 31 of AI systems [8], [9], [11], [12]. Statistical and probabilistic 32 The associate editor coordinating the review of this manuscript and approving it for publication was Varuna De Silva . explanations are considered limited and less effective [13], 33 [14]. The relevant literature suggests that the prevailing XAI 34 frameworks do not fully comply with the norms of reg- 35 ulatory bodies and industry [5], [6]. A proven method of 36 measuring the non-explainability of an AI or ML system is 37 not available yet [15]. As availability of better XAI frame- 38 works would boost user confidence in ML and AI systems, 39 attempts are underway to develop holistic XAI frameworks 40 [8], [9], [11], [12]. Since AI systems are still regarded as 41 difficult to understand, adopt and trust [16], several groups 42 and are engaged in holistic XAI framework development 43 efforts [17], [18], [19], [20]. 44 This work posits that perceiving XAI in a two-dimensional 45 space of predictive accuracy and comprehensibility results in 46 mixing factors of accuracy, explainability and accountabil-47 ity [1]. Such a convoluted representation does not help prac- 48 titioners, cannot fulfil regulators' expectations and, offers 49 limited transparency for establishing a chain of responsibility 50 [8], [9], [10], [11], [12]. In order to formulate a better XAI 51 framework's application would mean design and/or assess-89 ment of ML aspects of AI systems. In the following sec-90 tions, we demonstrate the framework application by assessing 91 and comparing three affective state classification systems. 92 As shown in Fig. 2, the AXAI capability framework would 93 allow for incorporating theoretical guarantees, empirical evi-94 dences and statistical assurances in AI systems. 95 In order to present the theoretical foundations of the 96 AXAI and demonstrate its utility, this paper is organized 97 in seven sections. After introducing this work in Section I, 98 Section II provides a brief overview of the XAI related issues 99 citing relevant works. We establish theoretical foundations 100 of the proposed AXAI framework in Section III. The fol-101 lowing Section IV demonstrates application of the proposed 102 framework in designing and assessing AXAI capabilities of 103 three ML systems. The three systems' assessment results 104 are presented in Section V. The proposed framework and 105 its applications are analysed and discussed in Section VI. 106 Finally, Section VII identifies the possible directions of future 107 work and concludes this work. 109 Issues pertaining to algorithmic biases embedded in ML 110 systems were first realized in the late 1970s [23]. Initial 111 ML systems had nothing but predictive accuracy to offer as 112 explanations. Later, it was realized that predictive accuracy 113 alone would not suffice dealing with biases. It was understood 114 that several embedded factors like the historical background, 115 political constraints, and institutional context of ML systems 116 also induce biases in ML system [17]. Such realizations are 117 still valid for all genres of AI systems including supervised 118 learning-supported classifiers, regression systems, unsuper-119 vised learning-supported clustering and labelling systems, 120 reinforcement learning systems and deep neural networks. 121 With time, the importance of explaining inferences, proving 122 system accuracies, addressing accountability in the context 123 of AI systems has increased [25], [26]. Recently, govern-124 ments and business entities have also started to emphasize 125 the need to account for the ethical implications of using 126 AI systems [5], [6]. A recent report jointly published by 127 the Ada Lovelace Institute, AI Now Institute and the Open 128 Government partnership lists some forty algorithmic account-129 ability mechanisms and their respective jurisdictions [28]. 130 Hence, XAI has emerged as a topic of interest for com-131 puter scientists, AI theorists and practitioners across various 132 domains [8], [18]. 133 Though rule-based expert systems and ML systems were 134 traditionally assessed on the basis of their predictive accu-135 racy alone [29], recent developments made it possible to 136 delineate them in a two-dimensional space of orthogonal 137 axes viz., predictive accuracy and comprehensibility [30]. 138 Consequently, ML systems are becoming relevant in solving 139 both routine and complex problems [31] and in some domains 140 they outperform humans and are becoming inevitable assets 141 [24]. Thus, ML systems are now being used in critical 142 tasks like disease diagnosis, psychological and psychiatric 143 VOLUME 10, 2022 FIGURE 2. A three-dimensional representation of the AXAI capability framework, showing all quantifiable elements of system accountability, comprehensibility and predictive accuracy in the three vectors. Each vector comprises of three unique elements. The proposed framework allows for the quantitative assessment and delineation of ML and AI systems in the three-dimensional AXAI space. upon these works, the description and scope of AI system 162 comprehensibility was further refined in [3]. As evident in 163 IEEE standard P2840, researchers are trying to go beyond 164 the current XAI capabilities for building responsible AI 165 systems [33]. 166 Accountability, in the context of AI systems, connotes 167 compliance with ethical, procedural and legal norms while 168 processing information, invoking rules and making deci-169 sions [34]. A widely adopted definition of accountability 170 defines it as a relationship between an actor and a forum, 171 in which the actor has an obligation to explain and justify the 172 conduct. Also, the actor may face consequences [35] for the 173 impact of actions. Therefore, accountability is perceived as a 174 multi-factor issue that deals with transparency, interpretabil-175 ity, post hoc inspection of outputs, pre-and post-market 176 empirical performances and system design processes [1]. 177 The 2019 Algorithmic Accountability Act discussed in the 178 US senate required businesses to assess AI and decision 179 support system for risks associated with privacy and security 180 of personal information. The act also emphasized on assesing 181

II. ISSUES IN EXPLAINABLE ARTIFICIAL INTELLIGENCE
Thus, an intelligent tutor would be deemed responsible

239
Although several recent works discuss incorporating mea-240 surable parameters of accountability and explainability [17], 241 [27], little work has been done for developing a holistic 242 framework and providing a set of quantifiable features to 243 assess the AXAI capabilities of a system. A framework for 244 assessing the AXAI capabilities must be built upon con-245 siderations pertaining to personal, social, moral and legal 246 factors used to hold an individual accountable and liable for 247 explaining personal actions and decisions [41]. Significant 248 moral and legal factors that make a decision system liable to 249 explain decisions are [39]:

258
While suggesting enhancements to the prevailing XAI 259 capabilities, need for algorithmic accountability has been 260 highlighted in the recent works [19], [39], [40]. Account-261 ability of an AI system would depend on the context of the 262 confronted issue [19]. For example, how a medical AI system 263 chooses which one of two patients should be treated first or 264 how a search and rescue robot would pick one of several 265 injured victims [41]. Hence, an ML system should be aware 266 of the context of ethical values and should have the capacity 267 to understand the moral consequences of its actions and deci-268 sions [42]. Accountability should therefore be derived from 269 both information/data and the algorithmic approach [36]. The 270 employed algorithmic approach and data must be sensitive 271 to the context while making inferences and decisions [22], 272 [39], [41]. It is also argued that a system should be operated 273 in such a manner that the chain of responsibility is clear and 274 identifiable [25], [38].

275
In order to address such needs, our proposed AXAI frame-276 work includes a system accountability vector comprising of 277 three components viz., inspectability of input data models or 278 cues, inspectability of data being processed, and inspectabil-279 ity of output models or cues. In order to hold either a system 280 developer or a user accountable for the impact of system 281 decisions, relevant information must be presented to them in 282 a meaningful manner [42]. We posit that inspectability, in the 283 context of XAI, must allow users to examine the relevant 284 system details and let them determine if the system is able to 285 fulfil the decision-making requirements. Inspectability is also 286 referred to as verifiability and traceability in the literature and 287 is considered as one of the core features that ensure system 288 transparency [43], [44].

289
The proposed AXAI framework posits that system devel-290 opers and system users should be able to inspect the input 291 data, important details on data being processed and the output 292 VOLUME 10, 2022 information. Both developer and user would be expected to 293 understand, analyse and interpret the inspected data.

294
In the AXAI framework, an explanation is viewed as a 295 deductive argument containing universal laws. Following this to recognize or connect with a situation reflecting on system's 308 ability to provide readily understandable explanations.

309
The predictive accuracy of a system in our AXAI frame-

347
We assume that an ML system is a definite program P. 348 Our definition of a definite program considers P as having a 349 set of stages or series of steps that help in transforming a set 350 of inputs into some desired outputs [28]. This definition of 351 P also considers a system as a holistic system comprising of 352 one or multiple systems, sub-systems or algorithms, capable 353 of producing the desired outputs that enable making infer-354 ences and decisions [28]. Such systems include: supervised 355 learning-supported classifiers and regression systems, unsu-356 pervised learning-supported clustering and labelling systems 357 and, reinforcement learning systems including deep neural 358 networks. Therefore, such a system would include definite 359 symbols, definite functions, definite propositions, definite 360 predicates, logical symbols, object variables and proposi-361 tional variables [48], [50].

362
In the following sections, C denotes a constant, p repre-363 sents a predicate symbol and shows a human population 364 having an individual human represented as 's'. In this paper, 365 shows a first-order variable and » is the background 366 knowledge. A human possessing the background knowledge 367 » is considered tantamount to a definite program P. D n 368 denotes a definition D having a number n and ½ in this 369 paper denotes a domain. Having these notations borrowed 370 fromm the previous works [7], [38], [41], [48], the following 371 subsections describe all measurable parameters belonging to 372 each of the three vectors forming the 3D AXAI measurement 373 space shown in Fig. 2. . Declared in a ML system (P) is p, which is public 377 with respect to a human population if p forms part of the 378 background knowledge » of each human s (s ). Otherwise, 379 p is a private predicate symbol contained in P. 380 D2: Let be a system. If the background knowledge » of 381 P is extended such that » ∪ is formed, then the predicate 382 symbol p P becomes a predicate invention since p was 383 originally defined in but not in ».

D3:
The AXAI capability denoted by C AXAI is a repre-385 sentation in a three-dimensional space. We posit that C AXAI 386 comprises of three independent vectors: ¼ (comprehensibil-387 ity), P A (predictive accuracy) and S A (system accountability). 388 Also, each one of the three vectors ¼, P A and S A comprises 389 of three independent components whose details are given in 390 the following definitions D4 -D6C. 391 D4: The comprehensibility ¼ of P in the context of a 392 human population is represented as ¼( , P) where ¼ is 393 a vector comprising of three components: the inspection 394 time (T it ), the predicate recognition time (T pr ) and the time 395 required to name a predicate (T pn ) such that: The predictive accuracy in the context of AXAI refers to the   that, to the best of authors' understanding, the cited literature 455 highly recommends integration of a human component in 456 assessing the system accuracy [27], [29], [30], [48].

D6:
The system accountability S A of a system P with 458 respect to a human population is represented as S A ( , P) 459 where S A is a vector comprising of three components: I in 460 (inspectability of input models or cues), I pro (inspectability 461 of data being processed) and I out (inspectability of output 462 models or cues) such that: (3) 464 The system accountability in the context of AXAI refers to the 465 mean accuracy with which a human s (s ) can realize any 466 occurrences of constants C, predicate symbols and variable 467 to correctly recognize a new definition with respect to the 468 domain ½.

D6A:
The mean score of inspectability I in of input mod-470 els/cues, supplied as named definitions q n (n = 1, 2, 3, . . . , n) 471 to a program P is an indicator of the mean clarity observed by 472 a human s (s ) with which s would inspect the definition q 473 before q is named as a predicate symbol p with respect to the 474 domain ½. Therefore I in reflects on the form and format of 475 the input models/cues with definitions q i (i = 1, 2, 3, . . . , i) 476 and predicate symbols p j (j = 1, 2, 3, . . . , j).

D6B:
The mean score of inspectability of data after being 478 processed, I pro in a program/system P is an indicator of the 479 mean clarity of the processed (or conditioned) definition q 480 as observed by a human from a population (s ). Hence 481 mean I pro is the mean clarity with which a human s inspects 482 the processed form of definition of q before q is named as a 483 predicate symbol p with respect to a domain ½. Therefore I pro 484 reflects on the form and format of the intermediary models 485 of definitions q n (n = 1, 2, 3, . . . , n) while any q n is being 486 transformed into a predicate symbol p.  Hypothesis 7: The inverse of the mean time 1 T it that a 523 human s (s ) requires for inspecting the information 524 presented by the program P before using the knowledge 525 provided by that P for solving a new problem in domain ½, 526 is directly proportional to the presentation quality ( p ) of P 527 given as 1 T it ∞ p .

528
Hypothesis 8: The inverse of the mean predicate recog-529 nition time 1 T pr that a human s (s ) requires to assign a 530 correct public name to a predicate symbol p in a system is 531 proportional to the ability (º p ) of recognizing and accurately 532 assigning a public name to a predicate symbol p. Hence, 533 1 T pr ∞º p . Note that an incorrect assignment of a public name 534 to a predicate symbol should not be counted and considered 535 in assessing a system.

536
Hypothesis 9: The ratio of the size of test data and the size 537 of the training data (r tst−trn ) of a program P is directly pro-538 portional to the level of rigour ( Rig ) applied in training and 539 testing P with respect to a domain ½, hence, r tst−trn ∞ Rig . 540 Hypothesis 10: The mean score of inspectability of data 541 after being processed (I pro ) shows how understandable the 542 intermediary data representation/models (¿ mod ) in a definite 543 program P are. Thus, I pro ∞¿ mod .  Table 1 presents a complete list of measurable parameters 549 used to determine the overall AXAI capability of a definite 550 program P.

553
In order to test the relevance of the AXAI capability frame-554 work, the AXAI scores of three ML systems were calculated. 555 The following subsections present details of the three ML sys-556 tems whose C AXAI scores were estimated using the proposed 557 AXAI framework. 558 TABLE 2. Guidelines for assessing, scoring and determining the AXAI capabilities of the three definite programs. The ASAM and ASAM-2 use multimodal input (facial expressions and speech cues) and the DAS uses facial thermal variations to analyse and recognize affective states.

559
The first assessed ML system was designed to have the  Table 1. Figure 3 shows the high-level 574 architecture of the ASAM.  Figure 4 shows input, data under processing and 587 output information that ASAM presents to users through 588 its GUI.

590
Ten qualified industry professionals and postgraduate stu-591 dents who were well-versed with ML and other AI-supported 592 systems had volunteered to assess the AXAI capabilities of 593 the ASAM. The parameters outlined in Table 1 were used for 594 assessment of the AXAI capabilities. During an introduction 595 session, these assessors who were educated in the fields of 596 engineering, social science and psychology were briefed and 597 informed on the objectives and outcomes of the assessment. 598 After the briefing, participants were given ASAM's system 599 user manual. Assessors had the opportunity to use the ASAM 600 before starting to assess its functionality. The ASAM asses-601 sors tested the ASAM for an average time of twenty minutes. 602 While testing, assessors awarded scores for parameters 4-6 603 and 10-12 on a 0-to-5 scale detailed in Table 2. Assessing 604 the ASAM on parameters 7-9 was not required as these 605 scores were supposed to be provided by the team of system 606 designers. The scores were normalised and converted to unit 607 vector forms (in the range of 0 to 1) allowing to delineate the 608 AXAI-capabilities of the ASAM in a 3D space as discussed 609 in previous sections and visualised in Fig. 2. The second system tested for its AXAI capabilities was 613 a modified and enhanced version of the ASAM called 614 ASAM-2. We designed ASAM-2 as a continuous assessment 615 tool capable of classifying 114 unique states across affec-616 tive speech and facial expression signals using a hierarchi-617 cal classification approach. In ASAM-2, a combination of 618 42 ternary/binary models was used. Similar to its predecessor, 619 VOLUME 10, 2022

FIGURE 4. (A)
The ASAM GUI. The home screen shows the mechanism of displaying the input information. The GUI shows data pertaining to all three input signals. This figure also shows how the lower-level functions of the system can be accessed from the home window (shown in Figure 4B). This GUI window is shown to users upon execution of the software. The image shown in the frame was taken from the RAVDESS dataset ASAM-2 is a real-time embedded system capable of being 620 added to an existing robotic system for affective state assess-    Table 5 highlights the 5-point scores given to ASAM-2 by the 649 users while assessing its AXAI capabilities.

STATES AND AROUSAL LEVELS
652 A third ML system tested for its AXAI capabilities was 653 also a definite program that was designed to work as a 654 two-step system of dynamic assessment of affective states 655 and arousal levels called DAS [54]. It uses thermal infrared 656 images (TIRI's) of facial expressions and was not designed to 657 have AXAI capabilities built into it. Hence, post-production 658 assessment of AXAI capabilities was performed in this case. 659 The DAS would first analyse TIRI's for examining the 660 haemodynamic variations caused by changes in affective 661 states. The algorithmic execution of DAS starts by analysing 662 the haemodynamic variations along the facial muscles. The 663 observed variations are used to estimate the affect induced 664 facial thermal variations. In the first step, 'between-affect' 665 and 'between-arousal-level' variations are subject to Princi-666 pal Components Analysis (PCA). The most influential princi-667 pal components are then used to cluster the features belonging 668 to different affective states. Subsequently each set of ther-669 mal features is assigned to an affective state cluster. In the 670 second step, the affective state clusters are partitioned into 671 high, medium and mild arousal levels. The distance between 672 a test TIRI and centroids of sub-clusters at three arousal 673 levels belonging to a single affective state, identified from 674 the first step, is used to determine the arousal level of the 675 identified affective state. Figure 5   The score parameters 4-6 and 10-12 in Table 1 were 734 respectively used to determine the system accountability and 735 comprehensibility vector norms. The mean values were deter-736 mined through user experiences and surveys of the system. 737 The system comprehensibility was found to be greater than 738 system accountability as reported in Table 3. The ||¼|| and 739 ||S A || values were calculated using equations (1) and (3).

740
The data in Table 3 highlights very good comprehensibility 741 results for the ASAM, with inspection time 'T it ' being the 742 highest, (average score T it = 3.95). The lowest component 743 in terms of comprehensibility was the predicate naming time 744 (average score T pn = 3.05). Given user responses, the general 745 feedback suggested that predicate naming was more difficult 746 and time consuming for assessors when compared to other 747 comprehensibility factors and should be addressed for future 748 works. 749 We found the system accountability scores to be compar-750 atively lower than the scores for comprehensibility, specif-751 ically in regard to the inspectability of the data processing 752 stages 'I pro '. User feedback suggested that while the ASAM's 753 rule-based expert system output showed how a combination 754 of signals could be used to report a multimodal output, the 755 ASAM could be improved by providing a better display of 756 the processed data for the facial expression, paralinguistic 757 and linguistic channels. In comparison, the inspectability 758 of inputs and outputs were received positively, highlighting 759 the ASAM's ability to report the system's initial and final 760 states.

761
The ASAM's GUI, shown in Fig. 4, was designed to dis-762 play some processed data stage information in the form of 763 associated weights of the rule-based expert system output. 764 Applying weight numbers to facial expression, paralinguis-765 tic and linguistic speech classification results allows for the 766 display of tabular and graphical rule-based system outputs 767 i.e., the transformation of data from input, to processed, to 768 output.

769
Using the reported scores and the consequential location of 770 the ASAM within the 3D space of ¼, P A and S A , we concluded 771 that improving I pro -related features would greatly enhance the 772 user experience and AXAI capabilities of the system. In sum-773 mary, the ASAM's scores for comprehensibility, predictive 774 accuracy and system accountability were respectively: ¼ = 775 1.203, P A = 1.546, and S A = 1.139. Thus, the three vector 776 norms provide an estimate of the ASAM's AXAI capabilities, 777 allowing us to visualise the ASAM's position within the 3D 778 axes as shown in Fig. 7.

779
Using these results, we could compare the AXAI capa-780 bilities of the three systems in terms of their levels 781 of explainability, predictive accuracy and comprehensibil-782 ity. However, the accountability score suggests that more 783 attention should be paid to the ASAM's accountability 784 components. The estimated S A score suggested that the infor-785 mation being processed I pro will not suffice user require-786 ments. Overall, the proposed framework provided a practical 787 and easy to follow method of assessing the AXAI capabilities 788 of the ASAM. 789  TABLE 3. The ASAM users' AXAI capability scores on a 0-5-scale and the normalised scores. These user scores were used to determine the ASAM's accountability 'S A ' and comprehensibility '||¼||' capabilities.

TABLE 4.
Users' experience scores and their normalised scores for ASAM-2 on a 0-5 scale. These scores were used to determine ASAM-2's system accountability 'S A ' and comprehensibility '||¼||' capabilities.  The scores derived in Table 5 determine ASAM-2's 816 comprehensibility and system accountability scores i.e.: 817 ¼ = 1.275 and S A = 1.453, we can see that the 818 changes made throughout the design process using feed-819 back from the ASAM shows significant improvements in 820 all three vectors when we compare their scores. Most sig-821 nificant, is the improvement in the predicate naming time 822 VOLUME 10, 2022 'T pn ' (3.05 → 3.75) and the inspect-ability of data process-823 ing stages 'I pro ' (1.9 → 3.5), which significantly enhanced 824 the user experience, and ultimately showed how the AXAI 825 framework could be used to improve the usability, trans-826 parency and explainability of AI and ML systems. The data in Table 4 suggest that DAS had a low level of and DAS (GREEN) systems are compared to show how the AXAI framework helps in assessing various ML systems. The three system are plotted in the tree-dimensional AXAI space. The placement of each circle shows system scores along the axes of comprehensibility, accountability and predictive accuracy making it easy for the system developer and system users to compare various AXAI aspects of the same system or multiple systems.
ML systems in a 3D AXAI capability space demonstrated 874 that the proposed framework was helpful in system design 875 and assessment of ML systems. Furthermore, the AXAI capa-876 bility framework also provided an opportunity to systemati-877 cally address ethical and professional issues, such as those 878 highlighted in [40], and [55] while building ML systems. 879 As evident in the above comparison, the nine measurable 880 components of the AXAI capability framework ensured pay-881 ing attention to system details, ethical responsibilities and 882 moral duties during the conceptual design and functional 883 analysis stages. Such manifestations have been desired in AI 884 and ML systems for quite some time [41], [50]. However, the 885 AXAI framework does not work as a purpose-built forensic 886 framework would in tracing and combating any deviations 887 from the expected system norms.

888
Building upon the XAI capability centred philosophical 889 discussions in the literature [23], [42], [61], our proposed 890 AXAI capability framework provides three sets of quantifi-891 able parameters, each having three variables, for assessing 892 levels of comprehensibility, accuracy and accountability. 893 Through these parameters, the AXAI capability frame-894 work ensures incorporating important ethical, moral and 895 legal safeguards in AI systems. This makes the proposed 896 AXAI capability framework relevant and contemporary. The 897 accuracy, comprehensibility and accountability measures 898 also provide the required breadth and depth for designing, 899 comparing and assessing AI systems in a domain-agnostic 900 manner. Hence, incorporating the AXAI capability frame-901 work would not limit the system developer to follow a par-902 ticular domain-specific method [ can be seen in terms of the mean readiness of a human 920 to apply the knowledge acquired from an AI program and 921 interpreting unknown problems within the domain. 922 We have modelled the predictive accuracy of an AI pro-   The accountability, manifested through its three components 935 (inspectability of input cues, processed data and, output cues) 936 facilitates establishing a chain of responsibility. If any one 937 or more of the three accountability components were not 938 inspectable by users then the system design team could be 939 held responsible for the shortcomings. However, if these com-940 ponents were inspectable then the user could be considered 941 responsible for any negative consequences. Hence, account-942 ability in our AXAI capability framework is assessed in an 943 appropriate context [1], [34]. 944 Because of the time limitations and the scope of this work, 945 we could not test hypothesis 3 given in sub-section C of 946 Section III. However, our inability to test the predictive accu-947 racy (P A ) of a human s to correctly name a predicate symbol p 948 given as a privately named definition q does not reflect on the 949 applicability of the AXAI framework. Testing this hypothesis 950 would require identifying and approaching domain experts 951 to confirm if the hypothesis is verifiable and useful in the 952 context of the affective computing systems assessed for this 953 work.

955
This work proposes a novel and easy to implement AXAI 956 capability framework for designing, analysing and assess-957 ing machine learning systems. The proposed framework, 958 as demonstrated through examples, was easy to incor-959 porate, application-agnostic and useful in comparing and 960 delineating various ML systems. While measuring AXAI 961 capabilities, the proposed framework also provides a measure 962 of non-explainability and addressed an issue raised in [15]. 963 The measure of assessing the non-explainability is given 964 as: non-explainability = 1 -explainability. Through the 965 proposed AXAI framework, automated matching of 'levels 966 of abstraction' [11] was also made possible as interpreta-967 tions were connected with interpretations and explanans were 968 aligned with explanans.

969
The proposed AXAI capability framework is based on 970 the realization that 'fundamentally complex' prediction tasks 971 would be influenced by developments in domain-specific 972 tools and techniques. Hence, the AXAI framework pro-973 vides an application-agnostic XAI capability incorporation 974 mechanism. It operates at a higher-level and is not affected 975 or influenced by developments in tools and techniques or 976 domain-specific changes in professional practices.

977
As explicit in this paper and part two of this paper [55], 978 the AXAI framework also provides design guidelines and 979 encourages provision of separable and quantifiable parame-980 ters of accuracy, comprehensibility and accountability. This 981 makes the proposed AXAI capability framework differ-982 ent from existing XAI incorporation methods. Part two of 983 this paper shows how developers and practitioners would 984 engage in the process of incorporating and evaluating the 985 efficacy of the proposed framework. Also, translating the 986 AXAI capabilities into a set of system design requirements 987 is demonstrated in part two of this paper [55]. Together, 988 the two papers will be useful in developing the system 989 requirements and producing a design process model as shown 990 in Fig. 8. The AXAI capability framework related stages 991 of the ML and AI system design are explicitly shown 992 in Fig. 8. 993 VOLUME 10, 2022 For building upon the initial success, the ML-centred 994 AXAI capability framework can be extended to others AI 995 systems. The framework needs to be tested on a larger set 996 of existing systems. We anticipate that parts one and two of 997 this work will initiate works on building more acceptable and 998 accountable intelligent systems. ments. However, these nine elements provide a set of parsimonious, swift and effective AXAI capability measurements. exhaustive, it would suffice the common comprehensibility, accuracy and accountability measurement requirements.