Loading [MathJax]/extensions/MathMenu.js
Human Trust in Robots: A Survey on Trust Models and Their Controls/Robotics Applications | IEEE Journals & Magazine | IEEE Xplore

Human Trust in Robots: A Survey on Trust Models and Their Controls/Robotics Applications


Abstract:

Trust model is a topic that first gained interest in organizational studies and then human factors in automation. Thanks to recent advances in human-robot interaction (HR...Show More
Topic: Modeling, Control, and Learning Approaches for Human-Robot Interaction Systems

Abstract:

Trust model is a topic that first gained interest in organizational studies and then human factors in automation. Thanks to recent advances in human-robot interaction (HRI) and human-autonomy teaming, human trust in robots has gained growing interest among researchers and practitioners. This article focuses on a survey of computational models of human-robot trust and their applications in robotics and robot controls. The motivation is to provide an overview of the state-of-the-art computational methods to quantify trust so as to provide feedback and situational awareness in HRI. Different from other existing survey papers on human-robot trust models, we seek to provide in-depth coverage of the trust model categorization, formulation, and analysis, with a focus on their utilization in robotics and robot controls. The paper starts with a discussion of the difference between human-robot trust with general agent-agent trust, interpersonal trust, and human trust in automation and machines. A list of impacting factors for human-robot trust and different trust measurement approaches, and their corresponding scales are summarized. We then review existing computational human-robot trust models and discuss the pros and cons of each category of models. These include performance-centric algebraic, time-series, Markov decision process (MDP)/Partially Observable MDP (POMDP)-based, Gaussian-based, and dynamic Bayesian network (DBN)-based trust models. Following the summary of each computational human-robot trust model, we examine its utilization in robot control applications, if any. We also enumerate the main limitations and open questions in this field and discuss potential future research directions.
Topic: Modeling, Control, and Learning Approaches for Human-Robot Interaction Systems
Published in: IEEE Open Journal of Control Systems ( Volume: 3)
Page(s): 58 - 86
Date of Publication: 20 December 2023
Electronic ISSN: 2694-085X

Funding Agency:


SECTION I.

Introduction

Human-robot trust determines human acceptance and hence the willingness to collaborate with robots [1]. Research has shown that trust is not only malleable but a critical determinant of human acceptance of technology and eventual performance [2]. Based on the modified IBM technology acceptance model (TAM), perceived usefulness and perceived ease of use have positive impacts on human trust, which influences the behavioral intention and serves as an indicator of the actual system use (see Fig. 1 for an illustration). There is an inverse correlation between trust and monitoring of automation [3]. Work can be delegated to robotic systems only if they can be trusted without constant monitoring. Improper/non-calibrated trust leads to misuse (e.g., acceptance of all of the robot's operations without questioning), under-utilization (e.g., not assigning a robot tasks that it is capable of completing), or disuse (e.g., not utilizing the robot at all) [4], [5], [6]. More specifically, overtrust leads to misuse and hence system performance degradation, while under trust leads to under-utilization/disuse and hence human stress and cognitive overload. Fig. 2 illustrates the relationship between trust, trustworthiness, and robot usage.

Figure 1. - Modified IBM TAM [1] showing trust as the indicator for the use of a system.
Figure 1.

Modified IBM TAM [1] showing trust as the indicator for the use of a system.

Figure 2. - Illustration of overtrust (disuse) and undertrust (misuse).
Figure 2.

Illustration of overtrust (disuse) and undertrust (misuse).

Despite the increasing attention in human-robot trust research [7], [8], social studies on interpersonal trust [9], [10] and social-network based trust [11], [12], [13], and human-machine trust research over the past four decades [3], [4], [5], there still lacks a good understanding on computational models of human's trust in robots. Many extant works remain at a descriptive and qualitative level since the trust between humans and robots is latent, fuzzy, and complex to measure and quantify. Human-robot trust model has yet to be elucidated from a computational perspective to capture its temporal, dynamic, and uncertain nature. The quantification of human trust when interacting with robots may be utilized to design robot motion planning and control algorithms, predict and optimize the dynamic allocation of autonomous functions, provide objective metrics for the selection of levels of autonomy (LoAs), and hence is of critical importance to the development of human-robot interaction (HRI) and human-robot teaming strategies [14]. The purpose of this paper is therefore threefold:

  1. Provide an overview of human trust in robots, especially a survey of most representative quantitative methods and computational models of human trust in robots;

  2. Investigate the utilization of computational human trust models in control and/or robotics applications;

  3. Discuss limitations, open questions, and gaps in the extant literature, and explore future research directions in the field.

Different from other recent surveys [15], [16], [17], we focus on human-robot trust models and will provide a clear classification of different trust models, a detailed introduction of the trust formulations, and compare their pros and cons. Furthermore, we will emphasize the control and robotics applications of these trust models, seeking to explore their utilization and potential contributions to control system technologies.

Our working definition of robot in this paper is: “A robot is an autonomous machine capable of sensing its environment, carrying out computations to make decisions, and performing actions in the real world” [18]. Compared to a machine, a robot can sense, think (in a deliberative, non-mechanical sense), and act [19]. We refer to robots in general, which may include ground/aerial/underwater mobile robots, fixed/mobile factory robots, self-driving cars, medical robots, humanoids, autonomous decision aids, etc.

Two types of performance/competency-related trust have been identified in the literature to capture the belief in automation/robot's ability to achieve an individual's goal [2], [20]: dispositional/initial trust, and history-based trust. Dispositional trust describes an individual's tendency to trust or distrust even without any interaction with the automation/robot. It is the inherent psychological trait of an individual, and hence unchanging. Dispositional trust is generally predictive of an individual's learned trust in that it is the starting point for an individual's interpersonal interactions with others. In contrast, history-based trust is a construct that dynamically adjusts cumulative interactions based on the external environment and context-dependent characteristics of the operator, which is the core type of trust in the computational trust models considered in this paper. Humans can either gain or lose trust based on the progress of the task [21]. Besides the well-acknowledged performance/competency-based trust, several studies [22], [23] also suggested that the intention-based trust should also be included in the human-robot trust framework. The former indicates that the human trusts that the robot is capable of performing a task, while the latter indicates the human trusts that the robot has good intentions to act for his/her benefit. Although intention-based trust typically occurs among human teams, the human anthropomorphism of robots makes it an unignorable component in the human-to-robot team.

Trust is a continuum spanning from distrust to trust [24], rather than a binary discrete variable of being either trustful or untrustful [24]. While dispositional trust may be static, history-based trust has a dynamic nature [2], [9], [25] (see Fig. 3). Although factors that influence dispositional trust (e.g., propensity to trust) may be static, they still impact human decision-making during interactions. Trust has inertia and initial bias [26] and it is easy to break but hard to recover [27]. Trust is individual specific [10]. While generalized trust models for a population provide guidelines that may be suitable for an average person, individualized trust models are also important for more accurate evaluation and prediction in human-robot teaming and collaboration [28]. Trust is domain (task-, situation-, context-) specific [10]. For example, it is not the same to trust an autonomous car on the road than to a factory manufacturing robot in assembly tasks [29]. Furthermore, trust is not completely transitive, i.e., “x trusts y more than y trusts w does not necessarily mean that x trusts w less than x trusts y[30], i.e., \begin{equation*} T(x,y)>T(y,w)\,{\not\Rightarrow {}}T(x,y)>T(x,w). \end{equation*} View SourceRight-click on figure for MathML and additional features.Because of all these properties, trust is a multi-dimensional construct and hence complicated to estimate and model [31].

In Desai's thesis [32], trust models for machines and automation were classified into five categories [33]: regression models [34], [35], time-series models [3], [4], [5], [26], [36], [37], [38], neural net models [39], qualitative models [33], [40], and argument-based probabilistic models [33], [41]. Among these models, the time-series, regression, and neural net models are quantitative. The regression models establish quantitative relations between independent variables (trust impacting factors) and the dependent variable (trust) but did not capture the dynamic variance in trust development. The neural nets model discovers the relationship between inputs (trust impacting factors) and output (trust). However, they rely on large datasets collected with human subjects and can not reveal the inner mechanism of trust due to the nature of neural networks. The time-series models characterize the dynamic relationship between human-to-robot trust and the independent variables, which require prior knowledge of the factors that impact human trust. Nevertheless, these are statistical models that fit the data without consideration of human psychology [26].

Figure 3. - Trust evolution process.
Figure 3.

Trust evolution process.

A lot of computational models have also been developed to evaluate trust in multi-agent social networks [11], [12], which can refer to the trust relationship among a society of people, robots, the Semantic Web, a recommendation system, an electronic commerce site (e.g., eBay, Amazon), etc. Kuter and Golbeck introduced a probabilistic network model for social networks and developed a trust inference algorithm called SUNNY that uses the trust inference procedure from TidalTrust [11] based on information sources with the highest confidence estimates [12]. Avesani et al. developed another algorithm to infer trust values in networks for a real-world application, namely moleskiing.it [42]. Pinyol and Sabater-Mir reviewed the most representative computational models for trust and reputation in multi-agent systems [29]. Reputation is regarded as a collective measure of trustworthiness therein [43]. Both cognitive models built based on beliefs, desires, and intentions, and numerical models based on game theory were introduced. Similar review papers can be found in [24], [43], [44]. In Mikulski's report [45], six categories of computational trust models for networks were summarized: scaled value models [46], multi-faceted representation models [47], [48], logical metric models [49], [50], [51], direct trust models [52], [53], recommendation trust models [54], [55], [56], [57], [58], and hybrid trust models [59], [60], [61], [62]. These works computed objective trustworthiness (in terms of reliability, utility, and risk) as a property of each agent in a network instead of an individual's actual emotion and cognition when interacting with networked multi-agent systems. These models are normative, which model the most ideal situations that might not often be found in practice. Their main utilization has been in economics and software development without the consideration of sociology and psychology nor verification with human subject test data.

Furthermore, none of the above models were personalized to each individual or customized to a specific scenario. There was also no discussion regarding leveraging trust models in control and/or robotics applications. In this paper, we seek to provide a review of works that attempt to address such problems. Furthermore, instead of reviewing all work for trust in machines/automation or general agent-agent, we will focus on computational models for human-robot trust and their control/robotics applications. As this field is still in its early stage, we hope our work may help guide future developments in human-robot trust models and steer research towards trust-aware robotic control for improved HRI.

The organization of the rest of the paper is as follows. Section II discusses the definitions and differences in interpersonal trust, human trust in machines/automation, and human trust in robots. We also summarize impacting factors of human-robot trust, existing trust measurement approaches, and corresponding scales. Section III surveys the state-of-the-art computational trust models, together with their utilization in control and robotics applications if any. We also discuss the merits and shortcomings of each type of trust model. Section IV discusses the limitations of the current trust modeling work and future directions in human-robot trust. Section V concludes the paper.

SECTION II.

Human Trust in Robots: Definition, Impacting Factors, & Measurements

A. Definitions of Human-Robot Trust

Cho et al.’s survey on trust modeling [24] summarized the concept of trust based on the common themes across disciplines (Sociology, Philosophy, Economics, Psychology, Organizational Management, International Relations, Automation, Neurology, and Computing & Networking) as “the willingness of the trustor (evaluator) to take risk based on a subjective belief that a trustee (evaluatee) will exhibit reliable behavior to maximize the trustor's interest under uncertainty (e.g. ambiguity due to conflicting evidence and/or ignorance caused by complete lack of evidence) of a given situation based on the cognitive assessment of past experience with the trustee.”

Mayer et al. defined interpersonal/organizational trust as the “willingness of a party to be vulnerable to the actions of another party, based on the expectation that the other will perform a particular action important to the truster, irrespective of the ability to monitor or control the other party” [10]. The authors indicated that this definition of trust applies to a relationship between a trustor and another identifiable party perceived to act and react with volition toward the trustor. There are also many other definitions for interpersonal trust and they are not unified. For years the definition of trust proposed by Mayer et al. was the most accepted and used one [5]. Similarly, Lee and See defined trust in automation or another person as “the attitude that an agent will help achieve an individual's goals in a situation characterized by uncertainty and vulnerability” [5]. This definition has been adopted by many human automation trust studies, and is considered the most thorough [15]. However, Rempel et al. argued that the interpersonal trust development process starts from observation-based predictability, followed by dependability, then to faith in the relationship [63]. While human trust in automation/machines tends to develop in an opposite way, which is initiated from the machine function description (faith) and then refined by the user experience (dependability) and observation (predictability) [3]. Furthermore, as Ekman et al. pointed out in [64], people tend to consider human-human interaction more in terms of trust than distrust while the order in human-machine is different.

As Scott et al. pointed out, human-to-robot trust shares similarity with human-to-human trust as humans tend to anthropomorphize robots [23]. Johnson-George and Swap pointed out “Willingness to take risks may be one of the few characteristics common to all trust situations,” and the possibility to fail is necessary for trust to be present [64]. Nevertheless, there are still challenges to extrapolating the concept of trust in people to trust in robots. We believe that human trust in robots differs from interpersonal trust, provided that robots as trustees differ from humans in ability, integrity, and/or benevolence. When considering human-to-human trust, the purpose, reliance, and transparency of the human is of great importance [70]. However, Lee and Moray's study [36] reveals that the performance and process of automation are the most important for humans to form trust. In contrast, the purpose of automation has been presupposed as they are designed to help humans and produce positive outcomes [71]. This is one of the differences between human-to-human trust and human-to-automation trust. We believe the same applies to human-to-robot trust.

Furthermore, human trust in robots is more complicated than trust in agents, machines, or automation [72]. For instance, humans may not interact physically with agents, which are otherwise more explicit in automation/machines and robots. Robots are mobile, while automation/machines are traditionally fixed-base and static. Robots are required to perform autonomous tasks and make human-like decisions in uncertain and complex environments, while automation operation conditions are relatively simple, predictable, and constrained. Robots can have self-governance and learning capabilities to adapt to new tasks, while automation is pre-programmed to perform a set of pre-defined actions [73]. Robots collaborate with humans more like a teammate than a tool. Furthermore, the types of interactions, the number of systems interacting with the human, the designated social and operative functions, and the environment all differentiate HRI from human-computer or human-machine interaction [74], [75].

Remark 1:

In the literature, the overall trust is considered as the combination of subjective and objective trust [24]. But it is more about the perceived trustworthiness of the trustee [14], which means its value can be affected by the trustor (human) factors. In contrast, trustworthiness is considered an intrinsic property of the trustee. In other words, the value of trustworthiness cannot be changed with respect to different trustors. It is objective and needs to be verified through observed evidence. There is no uniform agreement on the compositions of trustworthiness. It is also not rare to find that reliability and trustworthiness are interchangeable in some literature [76]. Besides received referrals and personal experience, the emotional aspects of the human also contribute to the distinctions between human trust in robot and the robot trustworthiness [43]. The distinctions between trust and trustworthiness are further categorized into the “over-trust” situation when the subjective trust is greater than the trustworthiness and the “under-trust” situation when the trust value is below the trustworthiness [77] (see Fig. 2). Intuitively, trustworthiness can be used as a good objective reference to judge human trust decisions.\hfill{\bullet}

Not surprisingly, there are many definitions of human trust in robots in the literature, and there is a lack of consensus. Schaefer et al. provided a list of 37 definitions for human trust in automation to get implications for HRI [78]. They suggest that this diversity in definitions of trust, as seen in other domains, supports the lack of a universal definition of trust. Table 1 lists the definitions of human-robot trust we found in the literature. These papers were published from 2007 to 2021. Among these works, the definition by Hancock et al. [22] seems yet one of the most referenced definitions of trust in robots. With the above studies, we may conclude that human-robot trust differs from interpersonal trust, human-agent trust, and human-machine/automation trust, mainly due to their different contexts, properties, and tasks. Next, we provide a short survey of impact factors that primarily affect human-robot trust.

TABLE 1 Definitions of human-robot trust.
Table 1- Definitions of human-robot trust.

B. Impacting Factors of Human-Robot Trust

Interpersonal trust usually involves the beliefs of ability, integrity, and benevolence [10]. For HRI, given that a robot has good intentions to collaborate with the human, human trust in robots can be influenced by factors within the robotic system, contextual (e.g., task and environmental related), and nature and characteristics of the human working with the robotic systems [6], [22], [79]. Hancock et al.'s 2011 and 2021 meta-analyses [8], [22] further suggested that robot performance is the most important factor that influences the level of trust and there are significant individual precursors to trust within each category. This analysis reinforces the notion that robot trustworthiness/competence predicts trust in robots, and trust, in turn, predicts the use of robots. Though the meta-analysis found only a moderate influence of environmental factors on trust and little effect of human-related factors, the authors mention that the pool of studies examining the effect of robot attributes on trust was severely limited [7]. The survey by Hoff and Bashir [2] further emphasized that this finding of the insignificance of human-related trust factors can be attributed to the shortage of experiments focused on individual differences and should be the focus of further investigation in the coming years. Accordingly, the works studying different antecedents of human trust in robots in the literature can be categorized into robot-related (including performance-based factors and attribute-related factors), human-related (including ability-based and characteristic-based factors), and contextual (including team collaboration-related factors and task-based factors) factors.

The following factors can be categorized as robot-related factors:

  • Robot performance-based factors: robot reliability-related factors (i.e., reliability [80], [81], [82], dependability [7], [83], failure rate [84], false alarm rate [7], predictability [83], [84], robot consistency [83], robot timeliness [83]), robot competence and capability [28], [85], robot behavior [75], [86], robotic system transparency [87], [88], [89], robotic system safety- and security-related factors (i.e., robot operational safety, privacy protection, data security) [90], and level of automation (LOA) [22].

  • Robot attribute-related factors: robot appearance [91], robot proximity [92], [93], anthropomorphism [79], and robot physical attributes [94], [95].

The following factors are human-related:

  • Human ability-based factors: workload [2], training [5], human mental model [64], and situational awareness [96].

  • Human characteristic-based factors: self-confidence [66], faith [3], [4], stereotypical impressions [97], [98], personality [73], [75] (such as propensity to trust [99]), gender [25], attitudes [100], culture [101], experience [102], age [103], situation management [85], locus of control [85], perception of robot attributes [104], [105], and social influence [106].

The following factors can be categorized as contextual factors:

  • Team collaboration-related factors: reputation [107], and the length of time that humans and robots have worked together [108].

  • tasking-based factors: Task difficulty [109], nature of tasks [75], and task familiarity to humans [110].

C. Trust Measurements and Scales

In contrast to the computational human-robot trust models we focus on in this paper, there are numerous trust measurement approaches for human trust in automation/robots based on the scales of the factors mentioned in the above section. Existing trust measurement approaches can be classified into two categories, i.e., posterior trust measurement and real-time trust measurement [111].

The posterior trust measurement approaches quantify human trust after interactions with robots/automation. A well-designed questionnaire is the most typical approach. The acknowledged questionnaire designs include [40], [80], [83], [112]. For example, Muir's 7-point Likert scale trust questionnaire [40], [113] has been adopted for human-robot trust measure as shown in Table 2. Rosemarie et al. developed the HRI Trust Scale based on five dimensions: team configuration, team process, context, task, and system [83]. Corresponding to different scenarios, questionnaires can be modified to accommodate specific features of the robots or automation [114] [115]. A set of scales corresponding to different factors, e.g., self-confidence and reliance, can be extracted from the questionnaire. In addition to the questionnaire method, which derives the value of trust based on all relevant factors from well-designed questionnaires, some work directly requires the participants to provide their levels of trust via a simple continuous scale, e.g., from 0 (fully distrust) to 100 (fully trust) [116] or measuring via objective behavioral actions such as use-choice [117]. Besides considering these scales as a multi-dimensional representation of the trust [118], sometimes it is preferable to combine the set of scales into binary [119], discrete [24], ordinal [120], or continuous values [121]. For example, Schaefer developed a 40-item trust scale [86] using a 0-100 percentage score. This way, a single-valued trust measure can be used as a metric for post decision-making or design purposes. The trust values obtained by these posterior measurement approaches provide valuable guidelines for robot/automation design. However, they do not offer insight into the trust evolution during the task execution. It is also skeptical that human operators can accurately and timely quantify the level of trust in their mental states [122]. They are hence less attractive to developing real-time robot motion planning and control algorithms. In addition, the drawback of high variances in the results is also mentioned in the literature [123].

TABLE 2 Muir's trust questionnaire.
Table 2- Muir's trust questionnaire.

Real-time trust measurement approaches have emerged in recent years. The psycho-physical signals, such as Electroencephalography (EEG) and blood oxygenation signals, are usually used to determine the trust values [111], [124], [125]. Although EEG sensors are good at emotion recognition, questions like how accurate this classification is, or whether the EEG sensors can be used to measure emotion signals quantitatively, remain unanswered.

Although the above trust measurement approaches have been proved and widely adopted by industry or academia, the lack of trust quantification and prediction and/or its transition process limits their applications in robot control and motion planning. As a result, this motivates the current paper on computational trust models and their robot control and robotics applications.

SECTION III.

Computational Human-Robot Trust Models

In this section, we provide a survey of computational human-robot trust models and a discussion regarding the respective merits and drawbacks of these models. Based on their respective modeling techniques, we categorize the models into five major classes: performance-centric algebraic, time-series, Markovian Decision Process (MDP)/Partially Observable MDP (POMDP), Gaussian-based, and dynamic Bayesian network (DBN)-based trust models. Note that other types of human-robot trust models also exist, but it is often one model for one category and hence we are not elaborating these models in this paper. We will introduce each category of models in a subsection. Each subsection will start with a brief introduction of the modeling method, followed by the detailed formulations of some representative models in the literature utilizing the modeling technique to describe human-robot trust and their corresponding robot control and robotics applications, and conclude with an analysis of the pros and cons of the trust model category. We will also briefly discuss the various approaches used to learn the model parameters. Although we summarized many factors in the previous sections, only a small portion of these factors have been adopted in the following computational trust models. Also note that, to keep notation consistency, we will use k to represent discrete time steps and t to represent continuous time throughout the entire section.

A. Performance-Centric Algebraic Trust Models

In this subsection, we introduce performance-centric algebraic trust models. These models are deterministic and algebraic. Consistent with Hancock et al.’s meta analyses [8], [22], these models consider robot performance or human-robot team performance as the main impacting factor of trust and use observations to evaluate performance. Algebraic formulations are used to compute performance and human-robot trust. Next, we review some representative performance-centric algebraic trust models and their robotics and robot control applications in the literature.

Floyd et al. proposed a performance-centric computational trust model for a simulated wheeled unmanned ground robot working cooperatively with a simulated human agent in movement and patrol tasks. It would be ideal if a robot is guaranteed to operate in a trustworthy manner. However, it may be impossible to elicit a complete set of rules for trustworthy behaviors if the robot needs to handle changes in teammates, environments, or mission contexts. Therefore, this work developed a trust model and a case-based reasoning framework to enable a robot to determine when to adapt its behavior to be more trustworthy for the team [126]. Behaviors that the robot itself can directly control were defined as the modifiable component, e.g., speed, obstacle padding, scan time, scan distance, etc. Define different modifiable component sets as C_{i}, i =1,\ldots, m, a robot behavior B can be composed of elements from each modifiable component set as \begin{equation*} B := \langle c_{1}, c_{2},\ldots, c_{m} \rangle, \end{equation*} View SourceRight-click on figure for MathML and additional features.where c_{i} \in C_{i}. The robot can switch to a different behavior B_{new} to achieve more trust from the operator. To use a traditional trust metric, the robot needs access to either (1) the operator's personal experiences and beliefs or (2) explicit feedback from the operator. In situations where the operator is unwilling or unable to provide his/her personal information or regular feedback, the robot will need to infer how trustworthy it is using observable evidence of trust. Therefore, an inverse trust estimation was utilized to let the robot infer how much trust an operator has in it rather than directly measuring how trusting it is of an operator. The inverse trust estimate was evaluated as follows: \begin{equation*} Trust_{B} := \sum _{j=1}^{n} w_{j} \times cmd_{j}, \end{equation*} View SourceRight-click on figure for MathML and additional features.where cmd_{j} \in \lbrace -1,1\rbrace is the evaluation of the successful completion of the command issued to the robot in demonstrating behavior B. If the jth (1 \leq j \leq n) command was completed successfully with cmd_{j} = 1, the trust estimate increases and vice versa. Weight w_{j} denotes various levels of success when the robot executes the command. Note that the trust model only looks for the general trends of trust change instead of the precise trust value of a robot. Two thresholds were set, i.e., the trustworthy threshold (\tau _{T}) and the untrustworthy threshold (\tau _{U}). The robot finds a trustworthy behavior B if Trust_{B}\geq \tau _{U}. It will use this behavior but may continue to measure its trustworthiness if any changes occur that would cause the behavior to no longer be trustworthy. On the other hand, an untrustworthy behavior B was stored into an evaluated pair E := \langle B, t\rangle, where t was the time that the behavior took to reach \tau _{T}. A case base CB was then constructed to collect all previously evaluated behaviors \mathcal {E}_\text {past} = \lbrace E_{1}, E_{2},\ldots, E_{n}\rbrace and the final trustworthy behavior B_\text {final} as problem solution pairs, \begin{align*} CB =& \lbrace \langle \mathcal {E}_{\text {past},1}, B_{\text {final},1}\rangle, \langle \mathcal {E}_{\text {past},2}, B_{\text {final},2}\rangle, \ldots,\\ &\langle \mathcal {E}_{\text {past},l}, B_{\text {final},l}\rangle,\ldots \rbrace. \end{align*} View SourceRight-click on figure for MathML and additional features.A case-based behavior adaptation was used for the robot to switch to a new behavior based on the similarity between the current and the new behaviors. The idea is that two similar problems may also have similar solutions. Hence, the robot can adapt its behavior by switching to the final behavior of the most similar case. Using the case-based behavior adaption methodology, the robot can find a trustworthy behavior using information from previous behavior searches. This approach can also accommodate undefined behaviors by exploring and adding them to the previous behavior case base CB.

A similar kind of binary observations mechanism was adopted in the RoboTrust algorithm [127] capturing human trust in unmanned ground vehicles (UGVs) platoon to report their driving conditions under cyber attacks. The UGV's average performance over different time intervals was calculated, and the most conservative (worst) average performance was then used to update trust. The trust update equation is, \begin{equation*} RoboTrust(k)=\min \left(\begin{array}{c}\frac{\sum _{j=k-\tau _{T}}^{k}OB(j)}{\tau _{T}+1} \\ \frac{\sum _{j=k-\tau _{T}-1}^{k}OB(j)}{\tau _{T}+2} \\ \vdots \\ \frac{\sum _{j=k-c_{T}}^{k}OB(j)}{c_{T}+1} \end{array} \right), \end{equation*} View SourceRight-click on figure for MathML and additional features.where k is the current time step, OB(j) is the human operator's observation on the performance of the UGV at time step j. OB(j)=1 if good performance is observed. In turn, OB(j)=0 when the human operator detects bad performance. The parameters \tau _{T} and c_{T} together determine the lower bound and upper bound of the number of the observations to be accounted. This model is a conservative trust update algorithm where the minimum of recent records is taken as the final trust value. Hence, the bad performance of the UGV is expected to destroy trust persistently, and the importance of the information from these untrusted UGVs will be degraded. Based on the collected reports from different UGVs via vehicle-to-cloud communication, a trust-based information management system was developed to identify the abnormal UGVs that need human intervention. This conservative strategy can be very helpful in adversarial situations where over-trusting the unreliable robot can put the whole team at risk. However, the slow trust recovery speed means a longer time should be waited before the robot can fully contribute to the team again. In other words, efficiency was sacrificed to obtain a higher level of security. In addition, only numerical simulations were conducted to demonstrate the overall framework, and this model failed to fully consider the human psychological aspects. The performance-based trust update might be too rational to factor in the randomness and emotional effects during the human decision-making process [128].

Xu and Dudek developed an event-centric trust model to model and predict trust change after the HRI session [104]. Within each interaction session, multiple events can occur, e.g., cold start and glitches caused by low reliability. The trust model was represented as a weighted linear sum with the trust output in a discrete level, \begin{equation*} \Delta T = \text {Quantization}\left(\sum _{i, j, k}\left(\omega _{0} \! + \! \omega _{i} \epsilon _{i}^{pe}(W)\! + \!\omega _{j} \epsilon ^{s}_{j} \! + \!\omega _{k} a_{k}^{s}\right)\right) \end{equation*} View SourceRight-click on figure for MathML and additional features.where \epsilon _{i}^{pe} represents an experience-based metric evaluated at post time window (W seconds) after the event occurs. \epsilon ^{s}_{j} is the experience-based metric evaluated for the entire HRI session. a_{k}^{s} was the user's subjective assessment of the HRI session, obtainable through questionnaires. \omega _{0},\omega _{i},\omega _{j}, and \omega _{k} are the corresponding weights. Here, the experience-based trust factors include the robot's task performance, failure rates, and the frequency of interventions from the human operator. Maximum likelihood (ML) supervised learning was then used to learn the parameters. The validation section demonstrated that the proposed model could predict human trust changes in the robot navigation guidance task with an acceptable error range.

Overall, performance-centric algebraic trust models offer an efficient trust evaluation mechanism based on observational evidence. They are created based on selective human-robot trust impacting factors, which includes robot performance as the major component [45], [109], [126]. Due to this performance-centric perspective, these works usually mix trust with trustworthiness, which is inaccurate per Remark 1 in Section II-A. Besides performance, many other robot-related, contextual, and human-related factors could be considered, as discussed in Section II-B. These deterministic and algebraic models also neglected the uncertain, stochastic, and dynamic nature of human feeling and emotion. Most of the time, no human subject tests were conducted to collect real human data for learning these models. There is so far no unified framework or quantitative metrics for modeling either robot performance or trust. The reason why each specific model is adopted in a particular task or application is not very well justified. Furthermore, the above models are algebraic and do not capture the dynamic evolution of trust and the causal relationship with the impacting factors. As a consequence, performance-centric trust models are better suited for scenarios where long-term interactions effect can be ignored, and the robots' performance is the dominant consideration in HRI.

B. Time-Series Trust Models

We first introduce the time-series models in general. A time series is a series of data points ordered in time. A time-series model is a dynamic system used to fit the time-series data.

Definition 1:

[129] An autoregressive model of order p, abbreviated AR(p), is \begin{equation*} y(k) = \sum _{i=1}^{p} \phi _{i} y(k-i) + w(k), \end{equation*} View SourceRight-click on figure for MathML and additional features.where the data y(k),y(k-i) are stationary with mean value being zero, w(k) \sim N(0, \sigma ^{2}_{w}) is a white noise, and \phi _{1},\ldots, \phi _{p} are non-zero constants.

The autoregression polynomial involves regression of its own lagged (past) values. Note that other probability distributions, e.g., gamma, Laplace, may also be assumed [130].

Definition 2:

[129] A moving average model of order q, abbreviated MA(q), is \begin{equation*} y(k) = w(k) + \sum _{j=1}^{q} \theta _{j} w(k-j), \end{equation*} View SourceRight-click on figure for MathML and additional features.where w(k), w(k-j) \sim N(0, \sigma ^{2}_{w}) are white noises, and \theta _{1},\ldots, \theta _{q} are non-zero constants.

The moving-average polynomial involves a linear combination of the error terms in time.

Definition 3:

[129] A time-series data y(k), k=0,1,2, \ldots is ARMA(p,q) if it is stationary and \begin{equation*} y(k) = \sum _{i=1}^{p} \phi _{i} y(k-i) + w(k) + \sum _{j=1}^{q} \theta _{j} w(k-j) \tag{1} \end{equation*} View SourceRight-click on figure for MathML and additional features.with \phi _{1},\ldots, \phi _{p}, \theta _{1},\ldots, \theta _{q} are non-zero constants and the white noise variance \sigma ^{2}_{w} > 0.

An autoregressive moving average vector form (ARMAV) is the multivariate form of ARMA and allows the use of multiple time series to model a system's input/output relationship [131], [132]. In (1), vectors \bm {y}(k)=(y_{1}(k),y_{2}(k),\ldots,y_{n}(k))^{T} and \bm {w}(k)=(w_{1}(k), w_{2}(k),\ldots,w_{n}(k))^{T}, and matrices \Phi _{i}, \Theta _{j}\in \mathbb {R}^{n\times n} can be used to replace the original scalars and obtain an ARMAV model.

Definition 4:

[133] An ARMA with exogenous input, abbreviated ARMAX, is \begin{align*} y(k) =& \sum _{l=1}^{r}\gamma _{l} u(k-l) + \sum _{i=1}^{p} \phi _{i} y(k-i)\\ &+ \sum _{j=1}^{q} \theta _{j} w(k-j) + w(k) \end{align*} View SourceRight-click on figure for MathML and additional features.with u(k) is exogenous input and \gamma _{1},\ldots,\gamma _{r} are non-zero constants.

For example, according to Definition 4, a simple dynamic regression model can be given as \begin{equation*} y(k)=\phi y(k-1)+\gamma u(k)+w(k). \end{equation*} View SourceRight-click on figure for MathML and additional features.

Similarly, a multivariate ARMAX model can be constructed using vector sequences of \bm {y}(k),\bm {u}(k),\bm {w}(k) and matrices \Phi _{i}, \Theta _{j}, \Gamma _{l},\in \mathbb {R}^{n\times n}. A time-series model can also be time-varying [132] and/or nonlinear [134]. Hierarchical time-series model that has a hidden or latent process of y(k) but an additional observation dependent on y(k) can be described with the state-space form in control theory for system analysis and control design. Corresponding model identification tools (e.g., Matlab System Identification Toolbox [135], statsmodels Python module [136]) can be used to estimate the parameters using the time-series data. Next, we review some representative time-series human-to-robot trust models.

Lee and Moray developed an ARMAV of the time-series trust model to capture the dynamic evolution of human trust in a machine performing a juice pasterization task in a simulated supervisory control task [36]. The authors manipulated the reliability of the automation and measured overall operator performance and trust in the automation. They found that a time-series model that incorporated automation performance and the fault or failure rate of the automation was most predictive of eventual trust. However, the noise part was neglected in the final time-series trust model, probably because the model fitting returned a zero coefficient for the noise error term. This means that the model does not consider past forecast errors to explain the current value, which could indicate a lack of autocorrelation in the data. However, since the fitting of time series models involves estimation and statistical inference, the parameter might not be exactly zero due to estimation errors. Therefore, this model may not be accurate and have reduced estimation and prediction accuracy. Inspired by this model, Sadrfaridpour et al. proposed a time-series human trust in robot model for human-robot collaborative assembly tasks by further considering human performance and fault occurrence [137]: \begin{align*} T(k)=AT(k-1)+B_{1}P_{R}(k)+B_{2}P_{R}(k-1)+C_{1}P_{H}(k)\\ +\,C_{2}P_{H}(k-1)+D_{1}fault(k)+D_{2}fault(k-1)\tag{2} \end{align*} View SourceRight-click on figure for MathML and additional features.where k is the time index, T(k) and T(k-1) represent current and prior trust. P_{R}, P_{H}, and fault are robot performance, human performance, and fault, respectively. The coefficients A, B_{1}, B_{2}, C_{1}, C_{2}, D_{1}, and D_{2} are constants obtained from human subject tests. Notice that robot and human performances are specific to the task and the human individual performing the task. In the collaborative assembly manufacturing task considered in [137], human trust dropped when the operator felt the robot was working at a different pace and the robot speed did not change accordingly. In other words, the human trust could be recovered if the robot's speed was flexible and dynamically adjusted to keep pace with the human. Therefore, the robot performance P_{R} \in \lbrace 0,1\rbrace was evaluated by its flexibility in accommodating human's working behavior and modeled as the difference between human and robot speed, \begin{equation*} P_{R}=P_{R,\max }-|V_{H}(k)-V_{R}(k)|, \end{equation*} View SourceRight-click on figure for MathML and additional features. where V_{R}\in [0,1] and V_{H}\in [0,1] are the normalized robot and human working speed, respectively. V_{R} \in [0,1] represents the situations from when the robot stops working (“0”) to when it works at its maximum speed (“1”). Correspondingly, V_{H} is the human working speed, and P_{R,\max }=1 is the maximum human performance. The human performance model P_{H} \in \lbrace 0,1\rbrace was inspired by the muscle fatigue and recovery dynamics that capture the fatigue level of the human body when performing repetitive kinesthetic tasks, which are typical types of human motions in manufacturing: \begin{equation*} P_{H}(k)= \frac{F_{max,iso}(k)-F_{th}}{MVC-F_{th}}, \end{equation*} View SourceRight-click on figure for MathML and additional features. where F_{max,iso}(k) is the maximum isometric force that one can produce when a muscle applies a certain force for an amount of time, \begin{align*} F_{max,iso}(k)=& F_{max,iso}(k-1) - C_{f} F_{max,iso}(k-1) \\ &\cdot\frac{F(k-1)}{MVC} + C_{r} (MVC - F_{max,iso}(k-1)) \end{align*} View SourceRight-click on figure for MathML and additional features.and the threshold force is \begin{equation*} F_{th}=MVC \frac{C_{r}}{2C_{f}}\left(-1+\sqrt{1+\frac{4C_{f}}{C_{r}}}\right) \end{equation*} View SourceRight-click on figure for MathML and additional features.where C_{f} is the fatigue constant, C_{r} is the recovery constant and F(k-1) is the applied force. Maximum Voluntary Contraction (MVC) is the dynamic model of fatigue for F_{max,iso}(k). The ARMA model in the MATLAB System Identification Toolbox was used to identify the model parameters. Same as [36], no noise term was found in the final learned trust model, which may sacrifice its prediction accuracy and eliminate the stochastic nature of human trust. Three control allocation strategies, i.e., manual control, neural network-based (NN) autonomous control, and mixed autonomous and manual control, were designed to adjust the robot speed to match human speed to increase human-robot trust. The NN1 autonomous controller used a 3-layer Perceptron artificial NN with a tangent sigmoid activation function for the hidden layer and a linear activation function for the output layer, which can be used to learn the pattern and predict human speed. Experiments with a Baxter humanoid collaborative manufacturing robot and a human participant were conducted to collect human subject data for model fitting and compare the performances of the three controllers. Sadrfaridpour and Wang implemented a similar time-series-like trust model (2) for human-robot collaborative assembly tasks [140]. The trust dynamics were then used as one of the constraints in a nonlinear model predictive control (NMPC) formulation2 to find the optimal robot velocity that minimizes the weighted sum of the differences between human and robot path processes, current and maximum robot velocities, and current and maximum human-to-robot trust. Twenty participants were recruited to collaborate with the Baxter robot in a car center console assembly task for data collection. A one-way repeated measure analysis of variance (ANOVA)3 was conducted and showed the statistical significance of the trust-based control strategy compared to benchmark strategies.

Azevedo-Sa et al. proposed a linear time-invariant (LTI) system model and estimated the dynamics of drivers' trust in level-3 autonomous driving [145]. The authors considered the influence of Boolean event signals of the alarm system in the autonomous driving system. Gaussian variables were used to model trust T(k), and the observation variables, such as drivers' focus \varphi (k) on the non-driving-related task (NDRT), drivers' usage v(k) of the autonomous driving system, and the NDRT performance \pi (k). The equation of the trust dynamics in the alarm system of the autonomous driving systems was \begin{align*} T\left({k+1}\right)&=\mathbf {A} T(k)+\mathbf {B}\left[\begin{array}{c}L(k) \\ F(k) \\ M(k) \end{array}\right]+u(k) \\ \left[\begin{array}{l}\varphi (k)\\ v(k)\\ \pi (k) \end{array}\right]&=\mathbf {C} T(k)+w(k) \end{align*} View SourceRight-click on figure for MathML and additional features.where L(k) represents whether a true alarm happens, F(k) denotes a false alarm happens or not, M(k) describes a miss of alarm or not, the parameters \mathbf {A}=[a_{11}] \in \mathbb {R}^{1 \times 1}, \mathbf {B}=[\!\begin{array}{lll}b_{11} & b_{12} & b_{13}\end{array}\!] \in \mathbb {R}^{1 \times 3}, \mathbf {C}=[c_{11} c_{21} c_{31}]^{\top } \in \mathbb {R}^{3 \times 1}, u(t_{k}) \sim \mathcal {N}(0, \sigma _{u}^{2}) and w(t_{k}) \sim \mathcal {N}(\mathbf {0}, \boldsymbol{\Sigma }_{w}). The parameters of the state-space model were identified with maximum likelihood estimation through linear mixed-effects models.4 Data was derived from a user experiment with a self-driving vehicle simulator. The user experiment obtained 80 participants' data to estimate the dynamic trust value and LTI model parameters. Though the proposed model and maximum likelihood estimation can approximately track the self-reported human trust in autonomous driving performance, the alarm signals may not be sufficient to cover more influential impact factors in continuous long-time driving.

Zheng et al. further extended time-series models to a human multi-robot trust model and used the model to encode human intention into the multi-robot motion tasks in offroad environments [147]. Each robot indexed with i=1,\ldots, I are subject to the offroad environmental attributes \mathbf {z}_{1:I,m}(k), m=1,\ldots, M, such as robot traversability and visibility. The linear state-space equation was given as follows \begin{align*} \mathbf {x}_{1:I}(k) &= \bm {B}_{0} \mathbf {x}_{1:I}(k-1) + \sum _{m=1}^{M}\bm {B}_{m} \mathbf {z}_{1:I,m}(k) + \mathbf {b} + \boldsymbol {\epsilon }_{w}(k),\tag{3a}\\ \mathbf {y}_{1:I}(k) &= \mathbf {x}_{1:I}(k) - \mathbf {x}_{1:I}(k) + \boldsymbol {\epsilon }_{v}(k), \tag{3b} \end{align*} View SourceRight-click on figure for MathML and additional features.to capture the influence of M offroad environmental attributes on human trust in multi-robot systems. Here, \mathbf {x}_{1:I}(k)=[x_{1}(k),\ldots,x_{I}(k)]^{T} is the human trust to each individual robot i, the I\times I coefficient matrix \bm {B}_{0} is the autoregression term \begin{align*} \bm {B}_{0} = \begin{bmatrix}\beta _{0} & & \\ {\beta ^{\prime }\beta _{0}} & \beta _{0} & \\ & \ddots & \ddots & \\ & & {\beta ^{\prime }\beta _{0}} & \beta _{0} \end{bmatrix}_{I \times I} \end{align*} View SourceRight-click on figure for MathML and additional features.and discounts the previous trust \mathbf {x}_{1:I}(k-1). It utilizes the temporal nature of a time-series model to capture the history-based human trust. Each of the I\times I coefficient matrices \bm {B}_{m}, \;m=1,\ldots, M is the dynamic feature term \begin{align*} \bm {B}_{m} = \begin{bmatrix}\beta _{m} & & \\ {\beta ^{\prime }\beta _{m}} & \beta _{m} & \\ & \ddots & \ddots & \\ & & {\beta ^{\prime }\beta _{m}} & \beta _{m} \end{bmatrix}_{I \times I} \end{align*} View SourceRight-click on figure for MathML and additional features.and quantifies the weight of robots' m-th column attribute \mathbf {z}_{1:I,m}(k). The constant vector \mathbf {b}= [b, (\beta ^{\prime }+1)b,\ldots, (\beta ^{\prime }+1)b ]^{\top }_{I\times 1} describes the human's dispositional trust in the MRS, which may come from the human's unchanging attributes, such as culture, age, gender, and personality traits [2]. Note that the parameter \beta ^{\prime } explicitly captures the influence of a proceeding robot on the human's trust in a succeeding robot in a formation control scenario of multi-robot systems. The residue \boldsymbol {\epsilon }_{w}(k) is a zero-mean process noise and follows a multivariate normal distribution, i.e., \boldsymbol {\epsilon }_{w}(k) \sim N(0,\;\Delta _{w}). In addition, the trust change y_{i}(k)=x_{i}(k)-x_{i}(k-1) at time step k is set to be bounded so that a human-computer interface (HCI) can be used for the human operator to provide self-reported data about trust. The residue \boldsymbol {\epsilon }_{v}(k) is a zero-mean observation noise \boldsymbol {\epsilon }_{v}(k) \sim N(0,\;\Delta _{v}) during the measurement of the trust change \mathbf {y}_{1:I}(k). The parameter estimation of the trust model (3a)--(3b) requires a massive amount of human-robot interaction data, and the computation is also intractable. Hence, Bayesian inference and Markov Chain Monte Carlo (MCMC) sampling algorithm5 were utilized to derive the posterior distribution of the trust model parameters based on a known prior distribution of the trust model parameters. Bayesian optimization [149] was further applied to design a sequential experiment to collect data and learn the human-multi-robot trust model parameters in a data-efficient way. The human-multi-robot collaborative offroad motion task was deployed in ROS Gazebo to conduct human subject tests and validate the benefits of the proposed trust model and Bayesian optimization-based sequential experiment design. The Wilcoxon signed-rank test with 16 participants demonstrated the capability of the trust model in capturing the human's trust in multi-robot systems with the goodness of fit metrics. Another one-way ANOVA test with 32 participants showed statistically significant results of the Bayesian optimization-based sequential experiment design in fewer collisions with obstacles, lower frequency of contact loss between robots, lower operator workload, and higher system usability compared with a benchmark experimental design approach. Although the trust model can demonstrate better explainability of the influence of the environmental attributes on human trust in multi-robot systems, the trust measurement relied on human self-reported trust, which may have bias [125]. In addition, the Gazebo simulated 3D scenario works similarly to a computer game and may lose fidelity compared to real human-robot interaction and cannot reveal the quantitative influence of the real robot's malfunctions, sensing noises, etc., on human trust in multi-robot systems.

Time-series trust models can capture the dynamic nature of trust and different causal factors that affect human-robot trust, including previous trust value that gives information about the memory of trust, current and previous robot and human performance, and other robot-related, human-related, and contextual factors. The parameter for each factor can be identified using model identification methods and corresponding toolboxes based on human subjective test data. Compared to the performance-centric algebraic trust models in Section III-A, the time-series trust models show the trust change due to performance increase or faults as a gradual process capturing inertia in an individual's level of trust instead of instantaneous change. The time-series trust models can also accommodate the uncertainty in data and find parameters that yield the best estimation and forecast accuracies. However, it is also important to note the limitations of the time-series trust models. Many time-series trust models assume a linearity form, which may oversimplify the relation between trust and its impacting factors. Similar to performance-based trust models, there lacks a unified framework for modeling robot performance and trust in different application scenarios. As a result, time-series trust models can be a suitable choice when it is necessary to consider the dynamics and multi-dimensionality of trust while maintaining a reasonable level of complexity to facilitate further trust dynamics analysis, especially for linear systems.

C. Markov Decision Process (MDP)-Based Trust Models

In this subsection, we first give an introduction to the general modeling approach.

Definition 5:

[150] A Markov Decision Process (MDP) models an agent's stochastic decision-making process in discrete time and can be represented by a tuple M=(S,A,T,R,\gamma) where

  • S is a set of states, which is called the state space;

  • A is a set of actions, which is called the action space;

  • T(s,a,s^{\prime }): S\times A\times S\to [0,1] is the state transition function representing the probability \Pr (s_{k+1}=s^{\prime }|s_{k}=s,a_{k}=a) of resulting in state s_{k+1}=s^{\prime } after taking action a_{k}=a at state s_{k}=s, where s, s^{\prime }\in S, a\in A;

  • R(s,a,s^{\prime }): S\times A\times S\to \mathbb {R} is the reward received after transitioning from state s_{k}=s to state s_{k+1}=s^{\prime } due to action a_{k}=a;

  • \gamma \in [0,1] is the discounting factor that discounts the weight of future rewards.

The reward R can also be replaced by cost. The MDP follows the Markov property, which states that the next state and reward only depends on the current state that contains all past information and the action taken at the current action, i.e., \begin{align*} &\Pr (s_{k+1},r_{k+1}|s_{k},a_{k},r_{k},s_{k-1},a_{k-1},\ldots,r_{1},s_{0},a_{0}) \\ &=\Pr (s_{k+1},r_{k+1}|s_{k},a_{k}). \end{align*} View SourceRight-click on figure for MathML and additional features.A policy \pi : S\to A gives the optimal action a=\pi (s) of the agent given the current state s, which typically maximizes the expected discounted sum of reward function over an infinite horizon: \bm {E}[\sum _{k=0}^{\infty }\gamma ^{k} R(s_{k},a_{k},s_{k+1})]. A variation of approaches can be used to find the solution to an MDP. For finite state and action spaces, dynamic programming is usually used [151], [152]. In the case when the transition probabilities of the MDP are unknown, reinforcement learning can be used [153].

Definition 6:

[154] Given two MDPs \mathcal {M}_{1}=(S_{1}, A_{1}, T_{1}, R_{1}, \gamma) and \mathcal {M}_{2}= (S_{2}, A_{2}, T_{2}, R_{2}, \gamma), the parallel composition of \mathcal {M}_{1} and \mathcal {M}_{2} is an MDP \mathcal {M}=\mathcal {M}_{1} \Vert \mathcal {M}_{2}=(S_{1} \times S_{2}, A_{1} \cup A_{2}, T_{1,2}, R_{1,2}, \gamma) where

  • S_{1} \times S_{2} is the Cartesian product of S_{1} and S_{2} and represents the state set of \mathcal {M};

  • A_{1} \cup A_{2} is union of A_{1} and A_{2} and represents the action set of \mathcal {M};

  • T_{1,2}((s_{1}, s_{2}), a,(s_{1}^{\prime }, s_{2}^{\prime }))=T_{1}(s_{1}, a, s_{1}^{\prime }) T_{2}(s_{2}, a, s_{2}^{\prime }), R_{1,2}((s_{1}, s_{2}), a)=R_{1}(s_{1}, a)+R_{2}(s_{2}, a), if a \in A_{1} \cap A_{2} and both T_{1}(s_{1}, a, s_{1}^{\prime }) and T_{2}(s_{2}, a, s_{2}^{\prime }) are defined;

  • T_{1,2}((s_{1}, s_{2}), a,(s_{1}^{\prime }, s_{2}))=T_{1}(s_{1}, a, s_{1}^{\prime }), R_{1,2}((s_{1}, s_{2}), a)= R_{1}(s_{1}, a) if a \in A_{1} \backslash A_{2} and T_{1}(s_{1}, a, s_{1}^{\prime }) is defined;

  • T((s_{1}, s_{2}), a,(s_{1}, s_{2}^{\prime }))=T_{2}(s_{2}, a, s_{2}^{\prime }), R_{1,2}((s_{1}, s_{2}), a)= R_{2}(s_{2}, a) if a \in A_{2} \backslash A_{1} and T_{2}(s_{2}, a, s_{2}^{\prime }) is defined.

Definition 7:

[155] A partially observable Markov decision process (POMDP) is an extension of MDP where the state information is incomplete and can only be partially observed. A POMDP can be represented by a tuple M=(S,A,T,O,R,P_{o},\gamma) where

  • S is a set of states;

  • A is a set of actions;

  • T(s,a,s^{\prime }): S\times A\times S\to [0,1] is the state transition probability;

  • O is a set of observations;

  • R(s,a,s^{\prime }): S\times A\times S\to \mathbb {R} is the reward;

  • P_{o}(s,o): S\times O\to [0,1] is the emission probability that gives the likelihood of observation o given the current state s, i.e., \Pr (o_{k}=o|s_{k}=s), where s\in S, o\in O;

  • \gamma \in [0,1] is the discounting factor.

In POMDP, because the actual state is hidden, decisions need to be made under the uncertainty of the actual state. The belief of the state can be updated based on the observation and action taken (see (4)). POMDP is often computationally intractable and can be approximated using point-based value iteration methods [156] or reinforcement learning [157]. Next, we review some representative MDP and POMDP trust models in the literature and their robotics and robot control applications.

Bo et al. introduced an optimal task allocation framework for repetitive industrial assembly tasks under human-robot collaboration [154]. The human trust MDP model M^{t} is constructed according to Definition 5 to describe the change of the human trust level under the incentive of robot and human actions. Similarly, a human fatigue MDP model M^{f} is constructed to describe the human fatigue process in the task collaboration; An action-deterministic transition system M^{w} is provided to describe the assembly plan with human and robot behaviors; A robot performance MDP model M^{r} is constructed to describe the stochastic dynamics of machine performance. Then, the HRI model is parallel composed based on Definition 6 as follows: \begin{equation*} M = M^{w} \Vert M^{r} \Vert M^{t} \Vert M^{f}. \end{equation*} View SourceRight-click on figure for MathML and additional features.The transition probability of M was not only dependent on the actions taken by the robot or human operator but also dependent on the robot's performance or the human trust level. This dependency explicitly modeled the impact of the human-robot interaction on the manufacturing process. An optimal task assignment policy was found with the composed MDP for a persistent linear temporal logic task specification. The optimality required both the maximization of probability to satisfy the task specification and the minimization of the expected average cost for each task cycle. The composed MDP model can provide a systematic description of the transition process for human trust level, fatigue level, and robot performance. However, the work itself did not reveal their inter-relationship. Human subjective tests were also lacking to verify the proposed computational MDP trust model and validate the influence of the proposed impacting factors.

Nam et al. studied the human robotic swarm trust model for the supervisory control of a target foraging/search mission [94], [95]. Trust is analyzed in three different LOAs: manual, mixed-initiative (MI), and fully autonomous LOA. In the manual LOA, the heading directions of a swarm are re-guided by an operator in changing the target search region. In the autonomous LOA, the swarm robots direct the heading angles by themselves. In the MI LOA, a switch mode allows the LOA to change between manual and autonomous operations based on the target search rate. Participants were asked to command the heading directions of the swarm in a simulator. Trust ratings were queried from participants based on their subjective feelings, which ranged from strongly trust to strongly distrust. Meanwhile, a set of data was collected, including the mean and variance of the robots' heading angles, the length of the velocity vector from the user command input, the convex hull area taken by the swarm, and the number of targets found. The results found that human trust was affected more by the swarm's physical characteristics, such as heading angles and convex hull area, than task performance. Fig. 4 illustrates the swarm's physical characteristics. Trust is modeled as an MDP M as in Definition 5 with

  • S=\lbrace s=(h_{k},c_{k},i_{k},t_{k})|h_{k} \in \mathbb {R}^{\geq 0}, c_{k} \in \mathbb {R}^{+}, i_{k} \in \lbrace 0,1\rbrace, t_{k} \in [0,1]\rbrace where h_{k} is the heading variance at time k, c_{k} is the convex hull area, i_{k} indicates if an intervention command is given or not, and t_{k} denotes the set of finite trust states;

  • A denotes the finite set of actions;

  • R(s_{k},a,s_{k+1})=\sum _{j=1}^{n} \alpha _{j} \phi _{j} is a reward function that describes the weighted sum of features \phi _{j};

  • T(s_{k},a,s_{k+1}) is the state transition function;

  • \gamma \in [0,1] is the discount factor.

Figure 4. - A swarm foraging and search mission.
Figure 4.

A swarm foraging and search mission.

Variations of the trust model were developed for different LOAs. An inverse reinforcement learning (IRL)6 algorithm was then utilized to learn the reward function, which was used in the MDP to predict trust in the three different LOAs. A swarm simulator with 32 homogeneous robots was deployed, and 30 participants were recruited to operate the robot simulator and provide trust feedback. Results showed that the model could predict trust in real-time without incorporating user input that provided ground truth values of trust. Compared to the DBN trust model [159] to be introduced in Section III-E, the MDP trust model reduced prediction error in all three LoAs. The MDP model concatenated human trust in swarm and swarm parameters as the state and captured its dynamic change. The model evaluated the trust of the human in the swarm with reference to the operator's real-time rating of swarm behavior in the training process. However, the explicit relation between trust and swarm parameters remains unknown. The multi-dimensional metrics in the state of MDP can be challenging for the system to accommodate a larger size of state space for the proposed trust model to perform online adaptation. These metrics were presented in relatively coarse levels to reduce the computation burden in the experiment.

Akash et al. established a POMDP model [160], [161], [162] to capture the dynamic change of human trust and workload for the collaboration between humans and an intelligent decision-aid of a teleoperated robot. The teleoperated robot that was equipped with a camera and a chemical sensor can detect dangers in buildings and provide a recommendation for the human companion on whether or not to take on protective gear. Humans can choose to either comply with or disregard the robot's recommendation. The system's reliability, transparency of the user interface, and presence/absence of danger were manipulated to observe humans' changes in trust and workload. Human compliance to the decision aid was utilized to implicitly infer their trust, while the human response time to the decision aid's recommendation was used to infer workload. Accordingly, a trust-workload POMDP was constructed as in Definition 7 with the finite state set S := [ {\mathit{Trust,} \quad \mathit{Workload}}]^{\top }, and the finite set of actions A := [S_{A}, E,\tau ]^{T}; Here, binary trust (low trust and high trust) and workload (low workload and high workload) values were assumed; S_{A} is the presence or absence of recommendation from the decision-aid, such as robot's report of danger or not, E is the experience and reliance of last recommendation, and \tau is the level of transparency; The set of observation is O := [Compliance,\; Response\;Time]^{\top }. Compliance can be either human disagree or agree with the recommendation and Response\;Time can be human fast or slow response time. All the action and observation states were also set to take binary values. However, the trust and workload models were separated to simplify the training process, and each variable was treated independently [160], [161], [162]. Accordingly, the reward function R was constructed as either a trust reward R_{T}(s^{\prime }|s,a) or a workload reward R_{W}(s^{\prime }|s,a). The trust reward R_{T}(s^{\prime }|s,a) is the sum of the corresponding state reward R^{S}_{T}(s^{\prime }|s,a), which is featured by penalizing a transition from any state to the state of low trust value given any action and performance reward R^{P}_{T}(s^{\prime }|s,a) which is determined by the reliability of the robot's recommendation, the errors brought by human's compliance (agree or disagree) to robot's recommendation. The workload reward R_{W}(s^{\prime }|s,a) only includes the state reward R^{S}_{W}(s^{\prime }|s,a) which is featured by penalizing a transition from any state to the state of high workload value given any action. Based on the aggregated data from human subjects in Amazon Mechanical Turk,7 the transition probability T(s,a,s^{\prime }) and emission probability P_{o}(s,o) of the POMDP were estimated. Their experimental results showed that high transparency of the decision aid (i.e., the amount of information provided to the human) increased trust if the existing trust was low, but decreased trust if the existing trust was already high. Furthermore, although increasing transparency can help humans make informed decisions and maintain trust, it also requires humans to process more information and increases workload. Hence, it verified that transparency of the HRI system should be optimized in designing the decision-aid instead of merely increasing trust. An optimal control policy was then developed to vary the transparency, i.e., the presence or absence of the sensor's chemical detection values and the camera's thermal images, to optimize the trust-workload trade-off in the building reconnaissance mission. A Q-MDP method was used to obtain the near-optimal solution, which involved the generation of the Q-function. \begin{equation*} a^* := \text{argmax}_{a} \sum _{s\in S} b(s) Q_\text {MDP}(s,a), \tag{4} \end{equation*} View SourceRight-click on figure for MathML and additional features.where b(s) is the belief of the state and can be iteratively calculated as \begin{align*} b^{\prime }(s^{\prime }) & := \Pr (s^{\prime }|o,a,b)\\ & := \frac{\Pr (o|s^{\prime },a)\sum _{s\in S}\Pr (s^{\prime }|s,a)b(s)}{\mathop {\sum }_{s^{\prime } \in S} (\Pr (o|s^{\prime },a)\sum _{s\in S}\Pr (s^{\prime }|s,a)b(s))}, \end{align*} View SourceRight-click on figure for MathML and additional features.and the Q-function can be iterated as \begin{align*} Q_\text {MDP}(s,a) &:= \sum _{s^{\prime } \in S}T(s,a,s^{\prime })(R(s^{\prime }|s,a) + \gamma V(s^{\prime })), \\ Q^{\tau }(s,\tau) &:= \sum _{S_{A},E}Pr(S_{A},E)Q_\text {MDP}(s, a := [S_{A}, E, \tau ]), \\ V(s) &:= \text {max}_{\tau }Q^{\tau }(s,\tau). \end{align*} View SourceRight-click on figure for MathML and additional features.Two cases were studied to synthesize the near-optimal solution with transparency \tau as the feedback. Although the POMDP model here captures a relation between transparency and human trust-workload, the workload itself is also an impacting factor of trust, as discussed in Section II-B. The quantified relationship between workload and human trust remains to be addressed. The work simplified the trust-workload model by assuming an independent propagation of trust and workload. Hence, the trust-workload MDP model was reduced to a trust MDP model and a workload MDP model. Therefore, whether it can adapt to an online trust-workload calibration process remains unknown. Furthermore, the setup of binary state, action, and observation values limited the ability of the POMDP model to capture the continuous trust dynamics. It is also unclear if the model and solver are scalable to larger state, action, and observation spaces.

Zahedi et al. proposed a Meta-MDP trust model for the human supervision task [163]. This model introduced the concept of trust transition based on the distance EX(\pi) between the robot's policy \pi and the human's expected policy \pi ^{E}. The Meta-MDP model was represented by the tuple (\mathbb {S}, \mathbb {A}, \mathbb {P}, \mathcal {C}, \mathcal {\gamma }), where \mathbb {S} denotes the state space comprising different trust levels, i.e., \mathbb {S} = \lbrace s_{0}, s_{1},{\ldots }, s_{N}\rbrace. The meta action space \mathbb {A} includes various policy candidates for the robot. The transition probability \mathbb {P} relied on the policy distance function EX(\pi). When the robot's policy perfectly matched the human's expected policy (EX(\pi) = 0), the human trust consistently increased until reaching the maximum level. However, the trust transition was subject to the following probability derivations for other policy candidates. At trust level s_{i}, the probability of the human operator choosing to monitor the robot was first denoted as \omega (s_{i}), where \omega (\cdot) \in [0, 1] and should be inversely proportional to the trust value. In the trust transition process, human trust tended to decrease to a lower level with a probability of \begin{equation*} \mathbb {P}\left(s_{k+1}=s_{i-1}|s_{k}=s_{i}, \pi \right)=\omega (s_{i})(1 - \mathcal {P}(EX(\pi))), \end{equation*} View SourceRight-click on figure for MathML and additional features.where \mathcal {P}(\cdot) denotes the Boltzmann distribution. Conversely, trust tended to stay at the same level if the human observed the robot's policy matched his/her expectation, i.e., \begin{equation*} \mathbb {P}\left(s(k+1)=s_{i}|s(k)=s_{i}, \pi \right) = \omega (s_{i})\mathcal {P}(EX(\pi)). \end{equation*} View SourceRight-click on figure for MathML and additional features.Furthermore, the trust level would increase if the robot can finish the task without human supervision at all, yielding a probability of \begin{equation*} \mathbb {P}\left(s(k+1)=s_{i+1}|s(k)=s_{i}, \pi \right) = 1 -\omega (s_{i}). \end{equation*} View SourceRight-click on figure for MathML and additional features.The cost function of this Meta-MDP problem was defined as, \begin{equation*} \mathcal {C}\!\left(s(k), \pi \right) = \left(1 - \omega \left(s(k)\right)\right) C_{r}(\pi) + \omega (s(k)) C_{s} (\pi), \end{equation*} View SourceRight-click on figure for MathML and additional features.where C_{r} represents the cost that the robot finished the task without human supervision. C_{s} is the cost associated with the human supervision process, such as the human intervention cost. The corresponding optimal policy choice strategy was solved based the MDP solver toolbox [164]. Human subject experiments demonstrated that this optimal policy choice strategy effectively maintained a reasonable level of human trust with minimal sacrifice to task performance.

Overall, the (PO)MDP models consider the dynamic and probabilistic nature of human trust and give a systematic approach to present the transition of robots' states and actions for seeking trust-associated optimal policies. They can describe the relation between human trust and robot behaviors/performance with corresponding probabilistic functions. On the other hand, these models are trained using a limited set of finely-tuned robot behaviors, typically unable to encompass the full spectrum of behaviors encountered during real-world operations. The reliability of human trust and decision-making when operating under such a trained model may be compromised by unforeseen robot behaviors. The generated optimal policy also lacks interpretability regarding the underlying human decision-making mechanism. In (PO)MDP, trust is usually modeled together with other variables such as workload and intervention as the state, which makes it difficult to reveal the relation between trust and these variables, which are often trust-impacting factors. Furthermore, due to the need to accommodate system and state changes, it is necessary to have an efficient online algorithm to find the optimal policy dynamically. In addition, these POMDP models are designed and trained based on a group of people. Thus, they represent a general prediction of human trust and corresponding optimal policy. That is, it is not an individualized model for a specific person that takes into account personality, cognitive abilities, etc. Last but not least, although (PO)MDP can be used for continuous state and action spaces, most existing (PO)MDP-based trust models merely consider discrete action and state spaces with a limited number of values. In practice, human trust evolves continuously instead of jumps/drops discretely. Therefore, such models can only approximate trust changes to some extent. In general, (PO)MDP trust models can be considered a superior choice for scenarios where the environment and tasks are relevantly well-defined with transition and observation uncertainties, and the state-action data pairs are readily accessible. The resulting model can integrate the trust state with robot behaviors, which is advantageous for optimizing the overall system performance.

D. Gaussian-Based Trust Models

In this subsection, we first briefly introduce the Gaussian distribution and Gaussian process (GP) for continuous random variables and processes.

Definition 8:

A Gaussian distribution, also known as the normal distribution, is a continuous probability distribution for a real-valued random variable, which can be represented mathematically as \begin{equation*} \mathcal {N}(x; \mu, \sigma)=\frac{1}{\sqrt{2\pi }\sigma }exp\left(-\frac{1}{2}\left(\frac{x-\mu }{\sigma }\right)^{2}\right) \end{equation*} View SourceRight-click on figure for MathML and additional features.where x is the value of the variable or data, \mu is the mean/expectation and \sigma is the standard deviation.

Many nature phenomena tend to approximate a Gaussian distribution. Graphically, a normal distribution looks like a “bell curve”. The cumulative distribution function (CDF) P(X\leq x) of a standard normal distribution \mathcal {N}(0,1) is denoted as \begin{equation*} \Phi (x)=\frac{1}{\sqrt{2 \pi }} \int _{-\infty }^{x} \exp \left(-\frac{t^{2}}{2}\right) \text {d} t. \tag{5} \end{equation*} View SourceRight-click on figure for MathML and additional features.The multivariate Gaussian distribution [165] extends the one-dimensional univariate Gaussian distribution by using random vectors \bm {x}=(x_{1},x_{2},\ldots,x_{m})^{T}. The probability density function (pdf) of a multivariate Gaussian distribution is written as \begin{align*} &\mathcal {N}(\bm {x};\bm {\mu },\bm {\Sigma }) \\ &=\frac{1}{(2\pi)^{m/2}|\bm {\Sigma }|^{1/2}}exp\left(-\frac{1}{2}(\bm {x}-\bm {\mu })^{T}\bm {\Sigma }^{-1}(\bm {x}-\bm {\mu })\right) \end{align*} View SourceRight-click on figure for MathML and additional features.where \bm {\mu }\in \mathbb {R}^{m} is the mean vector and \bm {\Sigma }\in \mathbb {R}^{m\times m} is the covariance matrix. Each dimension m is a univariate Gaussian distribution.

Definition 9:

A Gaussian process (GP) [166] is a stochastic process, i.e., a collection of random variables over time or space, which is fully represented by a mean function m(\bm {z}):\mathcal {Z}\to \mathbb {R} and a covariance function called kernel k(\bm {z},\bm {z}^{\prime }):\mathcal {Z}\times \mathcal {Z}\to \mathbb {R}, \begin{align*} f_{GP}(\bm {z})&\sim \mathcal {GP}(m(\bm {z}),k(\bm {z},\bm {z}^{\prime })) \\ m(\bm {z})&=\mathbf{E}[f_{GP}(\bm {z})] \\ k(\bm {z},\bm {z}^{\prime })&=\mathbf{E}[(f_{GP}(\bm {z})-m(\bm {z}))(f_{GP}(\bm {z}^{\prime })-m(\bm {z}^{\prime })] \end{align*} View SourceRight-click on figure for MathML and additional features.where \bm {z} \in \mathcal {Z} is the index (either time or space).

Any finite collection of these random variables (f_{GP}(\bm {z}_{1}),f_{GP}(\bm {z}_{2}),\ldots,f_{GP}(\bm {z}_{k})) follows a \bm {z}_{k}-dimension multivariate Gaussian distribution. Equivalently, any linear combination of (f_{GP}(z_{1}),f_{GP}(z_{2}),\ldots,f_{GP}(z_{k})) is a univariate Gaussian distribution. GP can be considered as an infinite dimension extension of the multivariant Gaussian distribution. GP can be used as a prior probability distribution over functions in Bayesian inference for model regression and learning.8 Next, we summarized some Gaussian-based trust models in the literature.

Chen et al. [113], [167] introduced a linear Gaussian equation to build human trust in a robot \begin{equation*} \theta _{k+1}\sim \mathcal {N}(\alpha _{e_{k+1}}\theta _{k}+\beta _{e_{k+1}},\sigma _{e_{k+1}}), \end{equation*} View SourceRight-click on figure for MathML and additional features.where e_{k+1} is the robot's performance, \begin{equation*} e_{k+1} = \text{performance}(x_{k+1},x_{k},a^{R}_{k},a^{H}_{k}), \end{equation*} View SourceRight-click on figure for MathML and additional features.which indicates the success or failure of the robot in accomplishing a task. Here, x_{k},\;x_{k+1} \in X are the world states, a^{R} \in A^{R} represents robot action, and a^{H}\in A^{H} represents human action. In addition, the human action is modeled to be dependent on the human's trust, world state, and robot action, see Fig. 5. The unknown parameters \alpha _{e_{k+1}}, \beta _{e_{k+1}} and \sigma _{e_{k+1}} can be estimated with Bayesian inference on the model through Hamiltonian Monte Carlo sampling using the Stan probabilistic programming. The human-robot team is formalized as an MDP M according to Definition 5, where the action set A=A^{R}\times A^{H}, the probability of transition function T is p(x^{\prime }|x,a^{R},a^{H}), and r(x,a^{R},a^{H},x^{\prime })\in R is the real-value reward. The history of interaction between robot and human until time k is h_{k} = \lbrace x_{0}, a^{R}_{0}, a^{H}_{0}, x_{1}, r_{1},\ldots, x_{k-1}, a^{R}_{k-1}, a^{H}_{k-1}, x_{k}, r_{k}\rbrace. Then, the human can use the entire interaction history to decide the next action. Denote the robot policy as \pi ^{R} and the human policy as \pi ^{H}, the expected total discounted reward of starting at a state x_{0} and following robot and human policy is \begin{equation*} v(x_{0}|\pi ^{R},\pi ^{H}) = \mathop {\mathbf {E}}_{a^{R}_{k} \sim \pi ^{R}, a^{H}_{k} \sim \pi ^{H}}\sum _{k=0}^{\infty }\gamma ^{k} r(x_{k},a^{R}_{k},a^{H}_{k}), \end{equation*} View SourceRight-click on figure for MathML and additional features.and the optimal robot policy can be estimated as \begin{equation*} \pi ^{R}_* = \text{argmax}_{\pi ^{R}} \mathop {\mathbf {E}}_{\pi ^{H}}v(x_{0}|\pi ^{R},\pi ^{H}). \end{equation*} View SourceRight-click on figure for MathML and additional features.Since the history h_{k} is long, it may be difficult to optimize the policy. Therefore, the interaction history h_{k} was approximated with the trust value \theta _{k}, i.e., \pi ^{H}(a^{H}_{k}|x_{k},a^{R}_{k},\theta _{k}) = \pi ^{H}(a^{H}_{k}|x_{k},a^{R}_{k},h_{k}). Furthermore, since trust can not be directly observed but may be inferred from human actions, the interaction was modeled as a POMDP (as shown in Fig. 5). The state of the POMDP is s=(x,\theta) where x is the fully-observable world state and the human trust \theta is hidden and only partially observable. The SARSOP algorithm [168] was used to solve for the optimal policy for the POMDP. Human subject tests were conducted both in simulation (201 participants) and experiments (20 participants) where a human collaborated with a robot to clear objects off a table. The human can intervene and pick up the object that the robot is moving toward, or stay put and let the robot pick the object by itself. The first simulation was conducted on Amazon Mechanical Turk, where the human participants interacted with recorded videos of the robot table-clearing task to decide whether they wanted to intervene in the robot's tasking. The second experiment used a real manipulator to replace the video for the human to interact with. The one-way ANOVA tests were performed and validated that the trust-POMDP robot policy could result in statistically significantly better team performance compared to the myopic strategy that robot picking objects did not take into account trust. Results also suggested that maximizing trust in robot did not necessarily improve performance. The POMDP model developed therein can calibrate trust with the task performance. However, the trust model only considered the influence of task performance and neglected other potential impacting factors of trust. Although the Gaussian trust model is a continuous distribution, a discrete set of 7 values was assumed for trust \theta, which may lose the granularity and fidelity of the model and the following model learning and policy decision-making algorithms.

Figure 5. - The trust-POMDP model [113].
Figure 5.

The trust-POMDP model [113].

Soh et al. modeled trust as a latent dynamic function \tau ^{a}_{t}(\bm {x}): \mathbb {R}^{d} \rightarrow [0,1] so that task features \bm {x} can be mapped into a real value in [0,1] to capture human trust in a robot's capabilities tasks [28]. A rational Bayesian framework imitating human cognitive process was adopted, and the trust of humans in a robot was estimated by integrating over the posterior: \begin{equation*} \tau ^{a}_{t}(\bm {x}_{k}) =\int P(c_{k}^{a} | f^{a},\bm {x})p_{k}(f^{a})df^{a}, \end{equation*} View SourceRight-click on figure for MathML and additional features.where \bm {x}_{k} denotes the task at time step k, c_{k}^{a} is the robot a’s performance at k with a binary outcome (c_{k}^{a}=1 indicating successful completion of the task and 0 vice versa), GP (see Definition 9) is assumed for the prior of f^{a} \sim \mathcal {GP}(m(\bm {x}),k(\bm {x},\bm {x}^{\prime })), and p_{k}(f^{a}) is human's current belief over f_{a}. Intuitively, f_{a} can be considered as a latent unnormalized trust value. The posterior distribution of human trust can then be updated via Bayes rule as: \begin{equation*} p_{k}\left(f^{a} \mid \bm {x}_{k-1}, c_{k-1}^{a}\right)=\frac{P\left(c_{k-1}^{a} \mid f^{a}, \bm {x}_{k-1}\right) p_{k-1}\left(f^{a}\right)}{\int P\left(c_{k-1}^{a} \mid f^{a}, \bm {x}_{k-1}\right) p_{k-1}\left(f^{a}\right) d f^{a}} \tag{6} \end{equation*} View SourceRight-click on figure for MathML and additional features.A probit function \Phi (as in (5)) was utilized to evaluate the likelihood of the observed binary performance observation c_{k}^{a}, \begin{equation*} P(c^{a}_{k}|f^{a},\bm {x}_{k})=\Phi \left(\frac{c^{a}_{k}(f^{a}(\bm {x}_{k}) - m(\bm {x}_{k}))}{\sigma ^{2}_{n}}\right), \end{equation*} View SourceRight-click on figure for MathML and additional features.where \sigma ^{2}_{n} is the noise variance. Due to the intractability of the Bayes rule (6) with a probit likelihood, the trust value was updated via approximate Bayesian inference, i.e., the posterior process is projected onto the closet GP measured by the Kullback-Leibler (KL) divergence,9 i.e., \text{KL}(p_{k} \Vert q), where q is the GP approximation. This work also introduced a neural network-based trust model and a hybrid model combining the GP and neural network models and compared the performance of different trust models. Both (i) a household experiment using a Fetch robot for picking and placing objects, and (ii) virtual reality simulations for autonomous vehicle driving and parking tasks were conducted. Human subject tests were performed to collect data to learn the trust models. Results showed that the GP model with prior pseudo-observations made better predictions in the household task, the neural network model outperformed in the autonomous driving task, and the hybrid model performed well across tasks.

There are also other works using different probability distributions to model trust. For example, Azevedo-Sa et al. proposed a bidirectional trust model that takes into account the difference between the trustee's capabilities \lambda =(\lambda _{1}, \lambda _{2}, \ldots, \lambda _{n})\in \Lambda and the task-required capabilities \bar{\lambda } = (\bar{\lambda }_{1}, \bar{\lambda }_{2}, \ldots, \bar{\lambda }_{n}) \in \Lambda [170]. The multidimensional capabilities (\lambda, \bar{\lambda }) encompass both performance-based metrics, such as tracking accuracy and motion smoothness (in the target-tracking task) and non-performance-based metrics, such as honesty and integrity. Conceptually, trust is represented by the trustor's likelihood assessment of a successful event from the trustee, where a successful event corresponds to an indicator function \Omega (\lambda, \bar{\lambda }, t) = 1. Drawing inspiration from the trust definition provided by Kok and Soh [67], the multidimensionality of trust and its dynamics with respect to time t are incorporated to model \tau (t) as follows: \begin{equation*} \tau (t) = \int _\Lambda p \left(\Omega (\lambda, \bar{\lambda }, t) = 1 |\lambda \right) bel(\lambda, t-1)d\lambda, \tag{7} \end{equation*} View SourceRight-click on figure for MathML and additional features.where, assuming the n capability dimensions are independent to each other, the conditional probability of a successful event is modeled as: \begin{equation*} p \left(\Omega (\lambda, \bar{\lambda }, t)=1|\lambda \right) = \prod ^{n}_{i=1}\left(\frac{1}{1 + \exp (\beta _{i}(\bar{\lambda }_{i} - \lambda _{i}))}\right)^{\zeta _{i}}, \end{equation*} View SourceRight-click on figure for MathML and additional features.where \beta _{i}, \zeta _{i} > 0 are the hyper-parameters tuned for the specific trustor characteristics. Moreover, the trustor's belief in the trustee's capability at t-1 is denoted as bel(\lambda, t-1) in (7). bel(\lambda, t-1) is assumed to follow a uniform distribution \mathcal {U}(\mathit {l}_{i}, \mathit {u}_{i}), where \mathit {l}_{i} and \mathit {u}_{i} are the lower and upper boundaries of the sampling interval, respectively. Intuitively, upon observing a successful event, the sample interval (\mathit {l}_{i}, \mathit {u}_{i}) shifts beyond the task-required capability \bar{\lambda }. Otherwise, the sample interval slides below \bar{\lambda }. Specifically considering robot's trust to human, by assuming the robot is pragmatic, the value of \beta _{i} in (7) may be sufficiently large in the modeling of the robot's trust in humans. The corresponding equation can then be approximated by \begin{equation*} \tau (t) = \prod _{i=1}^{n} \psi (\bar{\lambda }_{i}), \end{equation*} View SourceRight-click on figure for MathML and additional features.where \begin{equation*} \psi (\bar{\lambda }_{i}) = {\begin{cases}1 & \text {if } 0 \leq \bar{\lambda }_{i} \leq l_{i} \\ \frac{u_{i} - \bar{\lambda }_{i}}{u_{i} - l_{i}} & \text {if } l_{i} \leq \bar{\lambda }_{i} \leq u_{i}\\ 0 & \text {if } u_{i} \leq \bar{\lambda }_{i} \leq 1\\ \end{cases}}. \end{equation*} View SourceRight-click on figure for MathML and additional features.The capability belief interval (l_{i}, u_{i}) can be approximated by minimizing the difference between predicted trust \tau and historical success event rate \bar{\tau }, i.e., \begin{equation*} \hat{l}_{i}, \hat{u}_{i} = \arg \min _{[0,1]^{2}}\int _\Lambda \Vert \tau -\hat{\tau }\Vert ^{2}d\lambda \tag{8} \end{equation*} View SourceRight-click on figure for MathML and additional features.In an experiment conducted in the context of automated vehicle interactions, participants were tasked with watching videos showcasing autonomous vehicle driving performance. They were required to evaluate the levels of driving capabilities and provide trust scores for the vehicles. The results indicated that the proposed method outperformed both the GP model [28] and the OPTIMo model [159] in the trust prediction accuracy.

As another example, Guo and Yang proposed a Beta distribution-based human-robot trust model [171]. The mean of the Beta distribution was interpreted as the predicted trust, and both parameters of the Beta distribution were updated based on prior trust information. Hence, this model captures the nonlinearity of trust dynamics. The authors tested the proposed model with 39 human participants interacting with four drones in a simulated surveillance mission. The ANOVA results demonstrated that the proposed Beta distribution model had smaller root mean square errors in predicting human trust than the ARMAV and OPTIMo (to be introduced in Section III-E) trust models under their scenario.

Gaussian-based (or in general, probability distribution based) trust models provide a means to model trust as a continuous random variable or stochastic process, which may better capture the probabilistic and continuous nature of human trust in robots in practice. The trust value or distribution also gets updated dynamically through the updates of robot performance or the availability of a new observation. Furthermore, the Gaussian-based trust models are flexible and can be integrated into other decision-making and learning frameworks, such as POMDP and Bayesian inference. Human subject data can be used to learn the model parameters, resulting in either personalized or population-based models. However, it is unclear if the human trust does follow Gaussian, Beta, or another type of distribution, which might limit the reliability and prediction accuracy of the trust models. This further challenges the use of a rational Bayesian framework for updating trust. Similar to other trust models in the literature, only robot performance was used as the trust impacting factor in the existing Gaussian-based trust models, eliminating the effects such as task difficulty, risk, etc. The robot performance models used in these trust models may also be overly simplified and did not fully leverage the task features and information. Overall, assuming the appropriate underlying distribution, a Gaussian-based (or probability distribution based) trust model offers an intriguing option for capturing the randomness and stochastic nature of the trust process without imposing significant data demands.

E. Dynamic Bayesian Network (DBN)-Based Trust Models

In this subsection, we first introduce the Dynamic Bayesian Network (DBN). Denote \bm {X}=(X_{1}, X_{2},\ldots, X_{i},\ldots, X_{n})^{T} as a random vector that contains a set of time-dependent random variables X_{i},\;1\leq i\leq n. Let X_{i}(k) represent the random variable X_{i} at time step k,\;0< k< K. Let \bm {x}(k) represent a vector value of \bm {X}(k).

Definition 10:

[172] A DBN is a probabilistic graphical model that contains K random vectors \bm {X}(k),\;0< k< K and their conditional dependencies and relates these variables to each other over adjacent time steps. The following specifications specify a DBN:

  • An initial Bayesian network (BN) consisting of (a) an initial directed acyclic graph (DAG) G_{0} containing the variables in \bm {X}(0) and (b) an initial probability distribution P_{0} of these variables.

  • A transition BN which is a template consisting of (a) a transition DAG G_\to containing the variables in \bm {X}(k)\cup \bm {X}(k+1) and (b) a transition probability distribution Pr which assigns a conditional probability to every value \bm {x}(k+1) of \bm {X}(k+1) given every value \bm {x}(k) of \bm {X}(k), \begin{equation*} Pr(\bm {X}(k+1)=\bm {x}(k+1)\mid \bm {X}(k)=\bm {x}(k)). \end{equation*} View SourceRight-click on figure for MathML and additional features.

  • The DBN consisting of (a) the DAG composed of the DAG G_{0} and for 0 \leq k \leq K - 1 the DAG G_{\to } evaluated at k and (b) the following joint probability distribution: \begin{equation*} {\kern-14.45377pt}Pr(\bm {x}(0), \ldots \bm {x}(K))=P_{0}(\bm {x}(0))\prod _{k=0}^{K-1} Pr(\bm {x}(k+1)\mid \bm {x}(k)). \end{equation*} View SourceRight-click on figure for MathML and additional features.

Fig. 6 shows some example DBNs. DBN is a generalization of linear state-space models (e.g., Kalman filters), linear and normal forecasting models (e.g., ARMA), and simple dependency models (e.g., hidden Markov models (HMMs)) [173]. The expectation maximization (EM) approach can be used to learn the parameters of a DBN [174]. When the inference of the model becomes intractable, Bayesian inference [174] can provide approximate inference for learning. Next, we summarize some DBN-based human-robot trust models in the literature.

Figure 6. - Examples of DBNs.
Figure 6.

Examples of DBNs.

Xu and Dudek proposed a DBN-based computational trust model called OPTIMo [159], which relates human's current degree of trust t_{k}\in [0,1] at time step k to prior trust t_{k-1}, robot's current task performance p_{k}\in [0,1] and prior task performance p_{k-1}\in [0,1], human interventions i_{k}\in \lbrace 0,1\rbrace, change in trust c_{k}\in \lbrace -1,0,+1,\emptyset \rbrace, trust feedback f_{k}\in \lbrace [0,1],\emptyset \rbrace, and an extraneous cause state e_{k}\in \lbrace 0,1\rbrace, as shown in Fig. 7. Here, \emptyset denotes non-occurences. The conditional probability of trust t_{k} given the prior trust t_{k-1} and current and prior robot performance p_{k},p_{k-1} was given as \begin{align*} \mathcal {P}&(t_{k},t_{k-1},p_{k},p_{k-1}):= Pr(t_{k}|t_{k-1},p_{k},p_{k-1})\\ &\approx \mathcal {N}(t_{k};t_{k-1}+\omega _{tb}+\omega _{tp}p_{k}+\omega _{td}(p_{k}-p_{k-1}),\sigma _{t}), \end{align*} View SourceRight-click on figure for MathML and additional features.where \omega _{tb}, \omega _{tp}, \omega _{td} represent the effect of bias, robot's task performance, and difference in performance on trust updates, and \sigma _{t} reflects the variability in trust updates. The probability of the operator's intervention i_{k} was reflected as a logistic conditional probability density (CPD): \begin{align*}\mathcal {O}_{i}&(t_{k},t_{k-1},i_{k},e_{k}) := {\begin{cases}Pr(i_{k}=1|t_{k},t_{k-1},e_{k})\\ Pr(i_{k}=0|t_{k},t_{k-1},e_{k}) \end{cases}}\\ &={\begin{cases}\mathcal {S}(\omega _{ib}+\omega _{it}t_{k} +\omega _{id}\Delta t_{k}+\omega _{ie}e_{k})\\ 1-\mathcal {S}(\omega _{ib}+\omega _{it}t_{k} +\omega _{id}\Delta t_{k}+\omega _{ie}e_{k}) \end{cases}} \end{align*} View SourceRight-click on figure for MathML and additional features.where \omega _{ib}, \omega _{it}, \omega _{id}, \omega _{ie} represent the bias and weights of different causes (i.e., current trust, the difference in current and prior trust, and extraneous cause state) related to i_{k} and \mathcal {S}:=(1+exp(-x))^{-1} is the sigmoid distribution. e_{k} \in \lbrace 0,1\rbrace is the extraneous cause state reflecting the task change, which was added as a parent link to i_{k}. Reports of trust change c_{k} were modeled as sigmoid CPDs \mathcal {O}_{c}(t_{k},t_{k-1},c_{k}), \begin{align*} &{O}_{c}(t_{k},t_{k-1},c_{k}) := {\begin{cases}Pr(c_{k}=1|t_{k},t_{k-1}) \\ Pr(c_{k}=-1|t_{k},t_{k-1}) \\ Pr(c_{k}=0|t_{k},t_{k-1}) \end{cases}} \\ &Pr(c_{k}=1|t_{k},t_{k-1})=\beta _{c}+(1-3\beta _{c})\cdot \mathcal {S}(\kappa _{c}[\Delta t_{k}-o_{c}]) \\ &Pr(c_{k}=-1|t_{k},t_{k-1})=\beta _{c}+(1-3\beta _{c})\cdot \mathcal {S}(\kappa _{c}[-\Delta t_{k}-o_{c}]) \\ &Pr(c_{k}=0|t_{k},t_{k-1})= 1\! -\! 2\beta _{c}\! -\! (1-3\beta _{c})(\mathcal {S}(\kappa _{c}[\Delta t_{k}-o_{c}])\\ &\qquad\qquad+\mathcal {S}(\kappa _{c}[-\Delta t_{k}-o_{c}])), \end{align*} View SourceRight-click on figure for MathML and additional features.where c_{k} was recorded by asking the user whether his/her trust states change periodically, \kappa _{c} is the variability, \beta _{c} is the idling bias uniform error term, and o_{c} is a nominal offset in latent trust change \Delta t_{k}. The uncertainty \sigma _{f} in the user's absolute trust feedback f_{k} was reflected as a zero-mean Gaussian CPD: \begin{equation*} \mathcal {O}_{f}(t_{k},f_{k}) := Pr(f_{k}|t_{k})\approx \mathcal {N}(f_{k};t_{k},\sigma _{f}) \end{equation*} View SourceRight-click on figure for MathML and additional features.where f_{k} was queried using trust feedback questionnaires following each experiment session. The trust belief was updated recursively based on Bayesian inference. The filtered trust belief estimates the probabilistic belief of the user's trust state t_{k} at the time k, given prior data:

Figure 7. - OPTIMo: A DBN-based model for dynamic, quantitative, and probabilistic trust estimates.
Figure 7.

OPTIMo: A DBN-based model for dynamic, quantitative, and probabilistic trust estimates.

\begin{align*} bel_{f}(t_{k}) &=Pr(t_{k}|p_{1:k},i_{1:k},e_{1:k},c_{1:k},f_{1:k},t_{0})\\ &= \frac{\int \overline{bel}(t_{k},t_{k-1})dt_{k-1}}{\int \int \overline{bel}(t_{k},t_{k-1})dt_{k-1}dt_{k}}. \end{align*} View SourceRight-click on figure for MathML and additional features.

The smoothed trust belief at any time k-1 is shown as below: \begin{align*} bel_{s}(t_{k-1}) &=Pr(t_{k}|p_{1:K},i_{1:K},e_{1:K},c_{1:k},f_{1:k},t_{0})\\ &= \int \frac{ \overline{bel}(t_{k},t_{k-1})}{\int \overline{bel}(t_{k},t_{k-1})dt_{k-1}} bel_{s}(t_{k})dt_{k} \end{align*} View SourceRight-click on figure for MathML and additional features. where \begin{align*} \overline{bel}(t_{k},t_{k-1}):=\mathcal {O}_{i}(t_{k},t_{k-1},i_{k},e_{k}) \mathcal {O}_{c}(t_{k},t_{k-1},c_{k}) \\ \cdot \mathcal {O}_{f}(t_{k},f_{k}) \mathcal {P}(t_{k},t_{k-1},p_{k},p_{k-1}) bel_{f}(t_{k-1}). \end{align*} View SourceRight-click on figure for MathML and additional features.

Data was collected from 21 users on a visual navigation task, where an aerial robot was autonomously controlled to track boundaries, and the human operator could intervene and provide trust feedback. The learned trust model was shown to be able to predict the human dynamic trust state more accurately than the benchmark models. In sum, OPTIMo introduced a probabilistic trust model capable of inferring a human operator's degree of trust in a robot as a probability distribution at each time step k. Only robot performance was considered as the impacting factor of trust. It is a personalized model, and the individual-specific parameters were tuned by the EM algorithm. Its unique ability lies in estimating the operator's degree of trust in a near real-time manner. OPTIMo combined causal reasoning of updates to the robot's trustworthiness given its task performance with evidence from direct experiences to describe a human's actual level of trust. However, the observational study results revealed that the robot's task performance (considered as AI failures in the paper) was not significantly related to trust feedback (F_{1207}, p = 0.08), which is in contradiction with the literature as summarized in Section II-B. Therefore, from the limited results, it cannot be induced that up to what extent this model can reflect the impact of a robot's task performance on the human operator's level of trust.

Similar DBN trust models with different robot performance models can be found for the applications in runtime verification10 of multiple quadrotors' motion planning [176] and human-robot cooperative manipulation (co-manipulation) [177].11 Other works made some variations but generally followed a similar structure [180], [181], [182]. Following a similar structure as in Fig. 7, a DBN trust model for multi-robot symbolic robot motion planning12 with a human-in-the-loop was developed [180], [181]. Besides the robot performance, the DBN trust model also included human performance and faults as causal inputs. The resulting computational trust model was used as a metric for multi-robot task decomposition and allocation. Multi-robot simulations with direct human input and trust evaluation were conducted to implement the overall framework. Zheng et al. built a DBN human trust model for multi-robot bounding overwatch [184] tasks in offroad environments [182]. In such a scenario, trust was used to evaluate the reliability of autonomous overwatch robots in providing situational awareness to human-operated robots and making bounding decisions. The trust model used terrain traversability and visibility as the causal inputs to the latent trust state, which is associated with task specification, robot transition model, and the discrete mission environment. Only numerical simulations were conducted to demonstrate the overall framework.

Mangalindan et al. developed a POMDP framework for trust-seeking policies of a mobile manipulator to seek assistance from a human supervisor in object collection tasks [185]. The robot can perform the object collection task autonomously (corresponding to action a^-) or ask for human help (corresponding to action a^+). The human supervisor can either help the robot whenever asked (corresponding to observation o^+) or voluntarily intervene when he/she perceives that the robot may fail (corresponding to action o^-). The human experience was denoted as E^+ if the robot successfully completed the task and E^- otherwise. The reward R can then be defined as follows: \begin{equation*} R_{o,E}^{a} = \left\lbrace \begin{array}{ll}+2 & \text{if $(a,o,E)=(a^-,o^+,E^+)$};\\ +1 & \text{if $a=a^+$};\\ 0 & \text{if $(a,o)=(a^-,o^-)$};\\ -3 & \text{if $(a,o,E)=(a^-,o^+,E^-)$}. \end{array} \right. \end{equation*} View SourceRight-click on figure for MathML and additional features.The hidden trust state T_{k} takes binary values: low trust T^{L} and high trust T^{H}. The complexity of the trial C_{k} also takes binary values: low complexity C^{L} and high complexity C^{H}. An input/output hidden Markov model (IOHMM)13 was used to represent the causal relation between the hidden state trust T_{k}, the input (i.e., trust impacting factors) T_{k-1}, E_{k}, C_{k-1}, a_{k-1}, and the output observation o_{k}. Here, the observation o_{k} was also affected by a_{k} and C_{k}. The trust-aware policy was modeled as a POMDP, which sets the state as s_{k}=(T_{k},E_{k},C_{k})\in \lbrace T_{l},T_{H}\rbrace \times \lbrace E^-,E^+\rbrace \times \lbrace C_{L},C_{H}\rbrace, action as a_{k}\in \lbrace a^-,a^+\rbrace, and observation q_{k}=\lbrace o^-,o^+\rbrace. The transition probability Pr(s_{k+1}|s_{k}) of the POMDP model can be found using the trust model as shown in Fig. 8: \begin{align*} Pr(s_{k+1}|s_{k})=&Pr(T_{k+1}|E_{k+1},C_{k},a_{k}) \\ &\cdot Pr(C_{k+1})Pr(E_{k+1}|T_{k},C_{k},a_{k}). \tag{9} \end{align*} View SourceRight-click on figure for MathML and additional features.The reward function of the POMDP was then defined as \begin{align*} R(s,a^-)=&R_{o^+,E^+}^{a^-} Pr(E^+|o^+,C,a^-)Pr(o^+|T,C,a^-) \\ &+R_{o^+,E^-}^{a^-} Pr(E^-|o^+,C,a^-)Pr(o^+|T,C,a^-) \\ &+R_{o^-,E^+}^{a^-} Pr(o^-|T,C,a^-), \\ R(s,a^+)=&R_{\star,\star }^{a^+}. \end{align*} View SourceRight-click on figure for MathML and additional features.To solve the optimal policy, the POMDP was reformulated as a belief MDP. Two sets of robot experiments with 9 participants were conducted to collect data, i.e., human action o, robot action a, complexity C, and experience E. An extended version of the Baum-Welch14 algorithm was used to learn the trust model distributions, which can then be used to compute the transition probability (9) and the emission probability of the POMDP. Using a similar Q-MDP approach as in [160], [161], [162], a trust-seeking optimal policy was obtained. The experimental results showed that the proposed trust-aware policy outperformed an optimal trust-agnostic policy. This work integrated a DBN-based trust model into the POMDP framework and utilized the learned trust model to solve for the optimal policy for human-robot collaboration. However, only a binary trust state was considered, which may be oversimplified. Similar considerations were made for other elements in the POMDP model. Therefore, it is unclear if the proposed framework and POMDP solver would efficiently accommodate larger or even continuous state and action spaces.

Figure 8. - The IOHMM-based trust model.
Figure 8.

The IOHMM-based trust model.

Mahani et al. developed a DBN-based trust model for multi-robot systems [188]. Fig. 9 shows the structure of the trust DBN model for human-multi-robot teams. This model treated a set of degrees of human trust in a robot team \mathcal {R}=\lbrace 1,2,\ldots,\bar{r}\rbrace during each time window k as a vector \bm {T}_{k} of random variables T_{k}^{r}, r\in \mathcal {R}. The model deduced belief distribution \lbrace {bel}_{f}(T^{r}_{k}), \forall r \in \mathcal {R}\rbrace from factors including robot task performance P^{r}_{k}, human intervention I_{k} (the operational mode), and human absolute trust feedback F^{r}_{k} along with the time evolution of trust. The DBN model was partitioned into three layers (\bm {P}_{k},\bm {T}_{k},\bm {Y}_{k}) to represent the input \bm {P}_{k}=(P_{k}^{1},\ldots,P_{k}^{\bar{r}})^{T}, hidden \bm {T}_{k}=(T_{k}^{1},\ldots,T_{k}^{\bar{r}})^{T} and output \bm {Y}_{k}= (\bm {F}_{k}^{T}, I_{k})^{T}=(F_{k}^{1},\ldots,F_{k}^{\bar{r}},I_{k})^{T} variables, respectively. Based on the network structure as shown in Fig. 9, the joint distribution over these three layers of variables was found as, \begin{align*} &Pr(\bm {T}_{1:K},\bm {Y}_{1:K} \mid \bm {P}_{1:K}) \\ =&Pr(\bm {T}_{\bm {1}}) \prod _{k=1}^{K}Pr(\bm {T}_{k} \mid \bm {T}_{k-1}, \bm {P}_{k})Pr(\bm {Y}_{k} \mid \bm {T}_{k}), \\ =&Pr(\bm {T}_{\bm {1}})\prod _{k=1}^{K}\prod _{r=1}^{\bar{r}}{Pr(T_{k}^{r} \mid T_{k-1}^{r}, P_{k}^{r})} \\ &\cdot Pr(I_{k} \mid \bm {T}_{k})\prod _{r=1}^{\bar{r}}{Pr(F_{k}^{r} \mid T_{k}^{r})}, \end{align*} View SourceRight-click on figure for MathML and additional features.where Pr(\bm {T}_{1}) is the initial trust distribution, Pr(\bm {T}_{k}\mid \bm {T}_{k-1}, \bm {P}_{k}) is the transition distribution, and Pr(\bm {Y}_{k} \mid \bm {T}_{k}) is the emission distribution. This paper considered a scenario with human-UAVs collaborative searching for survivors after a fire disaster in an urban environment. Hence, the performance model of robot r was given as,

Figure 9. - DBN-based trust model for the human-multi-robot teams.
Figure 9.

DBN-based trust model for the human-multi-robot teams.

\begin{equation*} P^{r}_{k}=w_{b}+w_{v}v^{r}_{k}+w_{d}d^{r}_{k}+w_{s}s^{r}_{k}, \end{equation*} View SourceRight-click on figure for MathML and additional features.where v^{r}_{k}\in [0,1] is the normalized value for the robot velocity, d^{r}_{k}\in [0,1] is the ratio of the travelled distance to the total distance (progress), and s^{r}_{k}\in [0,1] is the robot detection accuracy during time window k. The parameters w_{b}, w_{v}, w_{d}, and w_{s} are constant weights. To capture the conditional dependencies in this Bayesian network, categorical Boltzmann machine (CBM)15 was used due to its higher performance in modeling human trust. First, the Boltzmann machine is a multinomial distribution that supports the EM algorithm for its computational efficiency. Next, the CBM provides the flexibility to incorporate more suitable initial distributions than other relevant CPDs. For example, the neural network for the transition distributions incorporated three nodes \bm {X}\triangleq (T^{r}_{k}, T^{r}_{k-1},P^{r}_{k}), \begin{equation*} Pr(T^{r}_{k}=j \mid T^{r}_{k-1},P^{r}_{k})\triangleq \frac{e^{E^{r}_{k}(j)}}{\sum _{l=1}^{\bar{m}_{T}}e^{E^{r}_{k}(l)}}, \tag{10} \end{equation*} View SourceRight-click on figure for MathML and additional features.where E^{r}_{k}(j) is the common neural network value known as the net input to j: \begin{equation*} E^{r}_{k}(j)\triangleq \sum _{i=1}^{\bar{m}_{T}}a_{2i}\omega _{1j,2i}+\sum _{l=1}^{\bar{m}_{P}}a_{\text{3}\;l}\omega _{1j,3\;l}. \end{equation*} View SourceRight-click on figure for MathML and additional features.The term \omega ^{r}_{1j,2i} is the weight for the connecting link between T^{r}_{k}=j and T^{r}_{k-1}=i. The term \omega _{1j,3\;l} is the weight for the connecting link between T^{r}_{k}=j and P^{r}_{k}=l. The terms a_{2i} and a_{\text{3}\;l} are the associated activation indicators. Fig. 10 illustrates the network structure for the transition distribution given by (10). Similarly, the conditional probability of I_{k}=j given the random vector \bm {T}_{k}, i.e., Pr(I_{k} = j \mid T^{1}_{k},\ldots,T^{\bar{r}}_{k}), and the conditional probability of human's absolute trust feedback F_{k}^{r} = j given T_{k}^{r}, i.e., Pr(F^{r}_{k}=j \mid T^{r}_{k}), can be described by their respective CBMs. A factorial form of the EM algorithm was used to find the parameters of the distributions for the multi-robot system. After the model parameters were identified, Bayes rule was used to update the trust belief {bel_{f}}(\bm {T}_{k})\triangleq Pr(\bm {T}_{k} \mid \bm {P}_{1:k}, \bm {Y}_{1:k}) and predict human intervention Pred(I_{k})\triangleq Pr(I_{k}\mid \bm {P}_{1:k}, \bm {Y}_{1:k-1}). Human subject tests were conducted for data collection and performance validation. The role a participant played was a supervisor. Her duties included monitoring situations and robot performance and helping a UAV with visual detection. Data from the performance training session were analyzed with a t-test by dividing the participants into two groups, one containing those with w_{b}< 0 (i.e., pessimistic operators) and the other with w_{b}>0 (i.e., optimistic operators). The robot's detection accuracy parameter w_{s} of the performance model for the two groups was then compared finding significant differences (2-tailed t=2.76, p=0.01) where presumably the pessimistic operators with w_{b}< 0 felt more sensitivity towards the robot's sensor detection accuracy s^{r}_{k}. The result of the experiment showed the Bayesian trust inference model can infer the degrees of human trust in multiple mobile robots and also predict human feedback on trust with relatively high accuracy (72.2\%). These findings confirmed the effectiveness of DBNs in modeling human trust toward multi-robot systems. However, only relatively small-size discrete trust values were assumed in the model to accommodate the EM algorithm. The result is also highly dependent on the choice of the initial distributions in the CBMs.

Figure 10. - A 2-layer neural network representing three variables $T^{r}_{k-1}, P^{r}_{k}$, $T^{r}_{k}$ in which $T^{r}_{k-1}$ and $P^{r}_{k}$ are connected to $T^{r}_{k}$. The variables $T^{r}_{k-1}, T^{r}_{k}$ can take on $\bar{m}_{T}$ values and $P^{r}_{k}$ can take on $\bar{m}_{P}$ values.
Figure 10.

A 2-layer neural network representing three variables T^{r}_{k-1}, P^{r}_{k}, T^{r}_{k} in which T^{r}_{k-1} and P^{r}_{k} are connected to T^{r}_{k}. The variables T^{r}_{k-1}, T^{r}_{k} can take on \bar{m}_{T} values and P^{r}_{k} can take on \bar{m}_{P} values.

DBN-based trust models are able to capture the dynamic property of trust and characterize the time-based trust evolution through the state transition distributions. Although both discrete random variables and continuous random variables can be accommodated in DBN with their corresponding parameter learning approaches, most existing DBN-based trust models assumed discrete levels of trust. DBNs with continuous random variables allow the modeling of trust as a continuous dynamic process, which is more consistent with human subject studies and can be further explored. Furthermore, since the DBN-based trust models use graphs to represent the causal relations among trust impacting factors and trust measurements/feedback, they are more interpretable. DBN trust models can also describe non-symmetry in trust in multi-robot systems by formulating different likelihood functions to represent human trust in each robot. However, the accuracy of DBN models is highly subject to the choice of initial probability distributions. The DBN-based trust models can only capture human trust updates well under the prerequisite that initial bias is accurately described. It is challenging work to model a suitable initial bias without a good understanding of the problem and experience. Consequently, with a reliable estimate of the initial bias, the DBN-based trust model offers high interpretability and can be a preferable choice for scenarios with identified trust evolution dynamics and casual relations between trust impacting factors, the hiddden trust state, and trust observations/feedback.

SECTION IV.

Limitations and Discussions on Future Research Directions

This paper is by no means exhaustive, and we only intend to show some most representative categories of trust models in the literature. It can be observed from existing work that quantitative and computational trust modeling is an interdisciplinary area that integrates knowledge in system modeling, machine learning, statistics, human factors, and psychology with various HRI applications. For different HRI tasks, different performance models were considered which led to different metrics or calculations of the trust values even following similar trust dynamics. Out of various models, the time-series, DBN, and MPD/POMDP trust models seem to have received the most attention, which resulted in more development and extension. On the other hand, although a handful of studies in trust measures and modeling can be found thanks to the increasing attention in this field and its wide-spanned applications across different disciplines and applications, the current literature is still lacking in several aspects. In this section, we will discuss the shortcomings of the current trust modeling literature based on our observations. Correspondingly, we believe a few directions are worth pursuing for future research.

A. Missing Pieces in Current Trust Models

We believe there are several missing components in the current trust modeling works. Although many factors impacting human trust in robots were identified in the literature as summarized in Section II-B, only a small portion of these factors (e.g., robot performance, false alarm rate, human performance, fault, task difficulty) have been adopted in these trust models. These models may be considered trust-like instead of actually modeling trust in that they capture certain psychological qualities of trust rather than seeking to model all aspects of trust development accurately. Many works also restricted trust as a discrete random variable that can only take values from a finite set, or just binary. The main reason would probably be the computation required to consider larger continuous trust state space and the corresponding parameter learning algorithms. This calls for a more general model that can accommodate more important trust-impacting factors, and efficient parameter regression/estimation, approximation, or model learning algorithms that can handle large state space.

As pointed out in [10], [29], trust implies a decision in risk-taking. Only when a trustor is willing to make himself/herself vulnerable by taking risks, does there arise a situation that needs trust. That is, risk is requisite to trust. Furthermore, risk propensity, i.e., an individual's tendency to be either risk-seeking or risk-averse, will also affect his/her trust, making a personal difference. When a trustor's risk-taking decision leads to a positive outcome, then the trustor's trust in the trustee will be enhanced and vice versa. Hence, it does not make much sense to model trust in isolation from the perceived risk level. However, very few trust models considered the risk in HRI and took it into account in modeling. On the other hand, the relationship between trust and risk has already been well studied in business and psychology [10], [190]. Risk quantification and risk-based decision-making have also been well-adopted in the robotics literature [191]. There are also works focusing on quantifying human risk attitudes in HRI [192]. These may shed some light on computational trust modeling from the risk perspective.

Also, almost all the current work models human trust in robots assuming that the robot has goodwill and its main impacting factor is robot reliability and performance. However, in practice, a robot may be subjected to cyber security issues and not have benevolence. As discussed in [22], [23], intention-based trust should also be considered. A malicious robot may be associated with a high-risk level, which should cause a significant decrease in trust if detected. Even under the assumption of good-intent robots, the performance of a robot may degrade due to its capability limitations, which in turn may cause a sudden drop in trust and gets recovered slowly if its performance improves. However, current works fail to capture such dynamics. On the other hand, there is a vast literature on modeling agent reputation and trustworthiness using probabilistic measures such as the Beta and Dirichlet distributions [60], [62], [193], [194], [195] for detecting unreliable members in a multi-agent system. Although these models have not been used in human-robot trust modeling, they may have great potential in evaluating an agent that may have performance degradation or even be malicious.

Most importantly, trust should be calibrated and maintained over time instead of maximized. Trust calibration is the correspondence between a human's trust in a robot and this robot's capabilities [34], [110], [196]. In the HRI context, trust calibration is referred to one's perception of the robot and the actual competency and capability of the robot [197]. Overtrust (over-reliance) or undertrust (under-reliance) in robots results in negative effects on robot performance and hence either misuse or disuse of the robots [198], [199]. It was further experimentally verified in [113] that maximization of trust does not lead to optimal performance of the human-robot team. Despite the importance of calibrated trust, only a few trust models [109], [113], [167] have considered how to adjust subjective trust to match the robot's trustworthiness. Shafi formulated the trust calibration problem as the minimization of the difference between human trust in robot and robot trustworthiness [109]. Chen et al. embedded trust in a POMDP framework, where the human trust in the robot can be adjusted by the action taken by the robot [113], [167].

Furthermore, humans may be unable to accurately and timely quantify the level of trust in their mental states [122]. In such cases, neither trust questionnaires nor direct human trust feedback can reveal the actual human emotion. Correspondingly, the human-robot trust models that utilize such data for model parameter learning might lead to results that deviate from the ground truth and provide inaccurate predictions. On the other hand, as discussed in Section II-C, EEG or other psycho-physical signals, may offer an additional measure in addition to human feedback and can be integrated into the existing models to improve the estimation and prediction accuracy.

Moreover, trust can be easily transferred from one context to another. The idea of trust transferability has been adopted in many business settings [200]. Human-subjects studies in organization science motivated the investigation of trust differences in different contexts, e.g. across tasks, environmental or situational factors [200]. More recently, Soh et al. [28] started the investigation on how human trust in robot capabilities transfers across tasks. The authors modeled trust in robots as a context-dependent latent dynamic function. The human-subjects study showed that perceived task similarities, difficulty, and observed robot performance have an influence on trust transfer across tasks. The result further showed that a trust model accounting for the trust transfer led to better trust predictions than existing approaches on unseen participants. We expect the study on trust transferability can potentially make a computational trust model applicable to different tasks in different domains with different participants to improve the generality of the current models.

Trust models for multi-human multi-robot systems also become increasingly necessary to better understand human-robot teaming. Although there are a lot of multi-agent trustworthiness models available in the literature [201], very few current literature extend the paradigm to model the trust feeling involving the human aspect. Except for some initiatives in modeling an individual's trust in multi-robot systems [127], [147], [188] and swarms [94], [95], there does not seem to have any effort on modeling a group's trust in a robot or multiple robots yet. As a future research direction, it could also be interesting to explore a model that can seamlessly integrate human-multi-robot trust and inter-human trust.

Last but not least, data acquisition is a common challenge in all trust modeling. Because trust is a psychological construct, data gathered from human subject tests are necessary for training and parameter estimation of trust models. Given human subject tests are expensive to run, both in terms of money and time, it is always difficult to acquire sufficient human data for trust modeling. First of all, the required sample size of human subject tests is a critical question. In statistical tests, power analysis [202], [203] and Monte Carlo analysis [204] can be used to obtain statistically defensible numbers of required human subjects. However, so far, there has not emerged a consensus on the ideal sample size to result in a satisfactory training and parameter estimation results of the trust modeling. Lessons may be learned from related research areas. For example, in computer-human-interaction (CHI) research, the local standard for sample size of human subjects is around 12 [203]. Second, the appropriate number of trials per human subject is another important issue. For example, within-subject designs require a larger sample size than between-subject designs and are more statistically powerful [205]. However, the selection of trials number per human subject will largely depend on the specific human-robot task and the desired accuracy of the trust model. There lack standards and metrics regarding the optimal number of trials. Nevertheless, data scarcity remains a great challenge for trust modeling. We believe potential ways to circumvent these challenges will be to build standard datasets for trust measurement (please see the discussion on ImageNet in Section IV-C) and to explore strategies for constructing an identifiable model with limited human data.

B. Control & Robotics Applications of Computational Trust Models

Despite a handful of literature studies on trust modeling, the majority of the work focuses on building and learning the model itself and do not necessarily explain the utilization of the computational trust models in various application domains, especially how trust quantification can be utilized in robot control designs. Some initial attempts have been made to integrate trust models into robot velocity and force control. For example, Sadrfaridpour and Wang implemented an NMPC for a manipulator end-effector velocity control taking into account trust constraints to improve the human operator's interaction experience with the robot [140]. Considering trust as a hint of the human operator's intent, Sadrfaridpour et al. also studied trust-based impedance control for human-robot co-manipulation, switching the robot behavior and the corresponding impedance control laws depending on its corresponding trust level [177]. To enhance human acceptance in the haptic teleoperation system, Saeidi et al. developed trust-based shared control for mobile robots, which dynamically adjusts the control authority to the robot based on trust computation [206]. To guarantee appropriate human utilization of the proposed interface, Vinod et al. formulated a human-automation system as a multiple-input-multiple-output (MIMO) LTI system and cast the user interface design problem as selecting an output matrix where the level of human trust was used as a constraint [207]. However, no explicit trust model was considered in this work. To design more flexible safe control considering different individuals, Ejaz and Inoue [208] modeled the trust of an ego vehicle towards surrounding pedestrians as a safety margin indicator and used the trust computation to influence control barrier functions16 designed to constrain the ego vehicle's states in a safe set while the ego vehicle ran the MPC algorithms. Note that this work considered robot-to-human trust instead of human-to-robot trust.

However, we believe there is much space in control system theory and technology to be explored related to the computational trust models. For instance, both time-series models with inputs, states, and outputs and value-based (not symbol-based) DBN models that describe a time-series effect can be converted into the state-space form [129], [210], which can further enable system analysis and controller design in control theory. The trust dynamics may also be coupled with the human-robot physical interaction dynamics in an augmented state-space form, where control theory can be leveraged to analyze the overall human-robot system. Moreover, trust may be integrated into the reachability and safety analysis of the HRI system, using approaches such as control barrier function (CBF) [209] and Hamilton-Jacobi (HJ) reachability [211]. On the one hand, by conceptualizing trust as a state of interest, the reachability and safety analysis can proactively safeguard against overtrust and/or undertrust. On the other hand, by relating trust to human's perception of the uncertainty and reliability of the interacting robot, we may adjust the rigidity of the safety/reachability control laws to achieve higher flexibility and efficiency [208]. Various robot control designs can be developed to regulate or optimize the system's performance. The rich literature in control designs for physical HRI [212], [213], [214], [215] can also be leveraged and extended. In addition, we think the trust calibration process may be cast as a set point regulation (assuming static trustworthiness), tracking (dynamic trustworthiness), or MPC (extending the framework in [109] by adding robot dynamics constraints) problem by controlling the robot inputs.

Furthermore, many works on trust measures and qualitative analysis suggested meaningful ways to understand and utilize trust for autonomous systems and robots. We expect these works to benefit even more with a computationally trust model that has the prediction ability. For example, quantified trust value can be used to determine what information to be presented through human-machine interface (HMI). One objective of HMI is to calibrate users' trust to its appropriate level. Knowing the dynamic evolution of the trust is crucial because it determines the useful information to be presented through HMI [64]. Trust can be used to determine the control allocation and motion planning of robots. Based on the study of Muir et al. [40], human trust is directly related to the use of automation systems. A lot of literature quantifies the level of human operator trust in their experiments based on this argument [96]. In such experiments, the automation/robot will provide at least autonomous and semi-autonomous modes. The trust level of the human operator is considered to decrease if he/she chooses to switch away from autonomous mode and vice versa. A possible solution is to adjust robot behaviors based on the predicted trust level. For instance, whenever trust is too low, robot behaviors can be changed to gain trust. Whenever trust is too high, the robot may deliberately perform untrustworthy behaviors and check if the human is overtrusting.

C. The Role of AI in Human-Robot Trust

Despite the prevailing attention to AI and deep neural networks in recent years and previous work on neural network-based human trust models in automation [32], neural network-based human-robot trust models are surprisingly rare in the literature. Soh et al. proposed a recurrent neural network (RNN)-based model to capture the task-specific trust dynamics [28]. This is the only neural network-based trust model we could find in the literature. The human subject experiment in this work showed the RNN-based model performed better in predicting trust on unseen tasks than the GP models summarized in Section III-D (from [28]) in automated driving scenarios. The RNN model has high flexibility in processing task and performance representations. Compared with the GP models, it can generate more accurate trust predictions.

We think this may be a new direction with a lot of potential in the future, especially where enough data is available for model training and learning. Neural network-based trust models offer a powerful approach to utilizing data to model complicated nonlinear relations between trust impacting factors and trust variables. Due to the strong computation capability of deep learning, a neural network-based trust model can potentially include a wide range of trust impacting factors as feature inputs to the neural network, far beyond what other types of models in the literature can handle. Furthermore, RNN can be used to capture the temporal dynamic behavior of the trust evolving process. But what has severely limited the prosperity of neural networks in the domain of trust models?

In general, data-driven AI models belong to black-box models. Compared to time-series and DBN trust models, the process lacks interpretability for decision-making based on observations and transparency needed for HRI. In addition, there lacks systematic collections of trust data. The current situation in trust modeling is very similar to the era of computer vision before ImageNet [216] when diverse hard-coded image processing algorithms dominated the field and neural networks remained silent. No standardized datasets are available for trust modeling. The measurement metrics vary across different laboratories and across different projects. Trust measurements are also in small quantities, expensive to collect, and seldom shared and reused in the community. Creating an online platform to host, administrate, and redistribute the data might benefit the entire research community. Furthermore, data may be biased or skewed. To fully leverage the strength of data and machine learning, neural networks-based trust models may be combined with the above models and human knowledge.

Lastly, human trust in AI-enhanced robots may be another new research direction. Recent advancements in artificial intelligence and deep learning enable rapid developments in robots as embodied AI agents. On the one hand, the powerful performances of AI-enhanced robots will likely improve human trust and hence the willingness to use and collaborate with robot partners. On the other hand, due to the lack of interpretability, AI-enabled robot behaviors may be unpredictable, making it challenging to build trust. Improving the transparency and explainability of AI-enhanced robots may mitigate this issue [217]. Furthermore, human trust in AI, robots, and robots as embodied AI may differ, which deserves scrutiny to realize trustworthy AI. One may also consider developing artificial robot-to-human trust. For example, Azevedo-Sa et al. created a bidirectional human-robot trust model and the utilization of non-performance-based metrics allows the proposed model to predict a robot's trust in humans [170]. Freedy et al. developed a measure of “goodness” for a specific operator [66]. A relative expected loss score (REL) was established to collect the observed human task allocation decision behavior, risk, and observed robot performance. Ejaz and Inoue [208] modeled vehicle trust to pedestrians. The momentary trust was the linear combination of pedestrians' phone usage, eye contacts, and pose fluctuation. Values of these impacting factors were extracted from images using neural network tools. Since trust is mutual, we may study the dyadic relation between human-to-robot trust and artificial robot-to-human trust [218]. Some initiatives in human-robot mutual trust modeling in the literature include [170], [206], [219], [220].

SECTION V.

Conclusion

With the fast and widespread adoption of robots in almost every section of our daily lives, it is pivotal to understand and analyze how humans feel about robots, especially the level of belief an individual would place in a robot, in order to design robots and control their behaviors in a way that we are willing to utilize and collaborate with. Therefore, in this paper, we attempted to survey some most representative computational models of human-trust-to-robot. We first investigated the definition of human-robot trust compared to interpersonal, social, human-automation, agent-agent trust, and trustworthiness. We also summarized the trust-impacting factors, current quantitative trust measurement approaches, and their corresponding trust scales. We categorized the existing computational trust models into five types: performance-centric algebraic, time-series, (PO)MDP-based, Gaussian-based, and DBN-based, trust models. Detailed formulations of these trust models were elucidated with an analysis of their advantages and shortcomings. Despite their respective advantages and limitations, these trust models may be applied to robot control, task allocation, motion planning, etc., in human-centric frameworks with enhanced HRI and situational awareness. We also provided our observations on the limitations of the extant trust models and discussed possible future research directions to fill these gaps. Given that the field of computational trust model is still in its early stages of development, the primary aim of this paper is to offer a comprehensive overview of the current state-of-the-art in computational trust models. As a future work, a rigorous guideline for trust model design/selection based on the specific task can be very promising.

References

References is not available for this document.