Trust in Robot–Robot Scaffolding

The study of robot trust in humans and other agents is not explored widely despite its importance for the near future human–robot symbiotic societies. Here, we propose that robots should trust partners that tend to reduce their computational load, which is analogous to human cognitive load. We test this idea by adopting an interactive visual recalling task. In the first set of experiments, the robot can get help from online instructors with different guiding strategies to decide which one it should trust based on the computational load it experiences during the experiments. The second set of experiments involves robot–robot interactions. Akin to the robot–online instructor case, the Pepper robot is asked to scaffold the learning of a less capable “infant” robot (Nao) with or without being equipped with the cognitive abilities of theory of mind and task experience memory to assess the contribution of these cognitive abilities to scaffolding performance. Overall, the results show that robot trust based on computational/cognitive load within a sequential decision-making framework leads to effective partner selection and robot–robot scaffolding. Thus, using the computational load incurred by the cognitive processing of a robot may serve as an internal signal for assessing the trustworthiness of interaction partners.

Trust in Robot-Robot Scaffolding Murat Kirtay , Verena V. Hafner , Minoru Asada , Life Fellow, IEEE, and Erhan Oztop , Member, IEEE Abstract-The study of robot trust in humans and other agents is not explored widely despite its importance for the near future human-robot symbiotic societies.Here, we propose that robots should trust partners that tend to reduce their computational load, which is analogous to human cognitive load.We test this idea by adopting an interactive visual recalling task.In the first set of experiments, the robot can get help from online instructors with different guiding strategies to decide which one it should trust based on the computational load it experiences during the experiments.The second set of experiments involves robot-robot interactions.Akin to the robot-online instructor case, the Pepper robot is asked to scaffold the learning of a less capable "infant" robot (Nao) with or without being equipped with the cognitive abilities of theory of mind and task experience memory to assess the contribution of these cognitive abilities to scaffolding performance.Overall, the results show that robot trust based on computational/cognitive load within a sequential decision-making framework leads to effective partner selection and robot-robot scaffolding.Thus, using the computational load incurred by the cognitive processing of a robot may serve as an internal signal for assessing the trustworthiness of interaction partners.Index Terms-Cognitive load, decision making, robot trust, scaffolding, visual recalling.

I. INTRODUCTION
T RUST is essential for facilitating social harmony and in simple physical interactions such as carrying furniture or dancing, as well as in making long-term financial or social bonding decisions.It is indispensable for a healthy society.As humans and artificial systems are expected to be part of our social life in the near future, trust between not only humanhuman but also robot-human and robot-robot partners must be studied.Indeed, the number of studies addressing trust among heterogeneous agents has gained considerable momentum in recent years, especially in social robotics and human-robot interactions [2], [3].
Trust has been one of the fertile topics in various disciplines, including sociology, philosophy, economics, and psychology [4].Each field distinctly defines and theorizes trust; there are some overlappings, though.For example, in sociology, the trust definition concerns the subjective probability of useful actions, whereas, in psychology, the emphasis is given to negative or positive social experiences [4].Moreover, the literature offers trust models by following sociocognitive-which includes mental ingredients, such as beliefs, goals, and motivations-and human cognitive development approaches [5], [6].Although the literature presents rich perspectives on defining and modeling trust, in this study, we follow the definition given in [7] to formalize a cognitive model of trust.We note that the same definition is introduced in [6] as a "balanced" definition of trust, which is implemented on a humanoid robot.According to this definition, x is a cognitive agent with explicit goals (in our setting, that is, reducing cognitive load), and y is an agent trusted by x to delegate some actions.In this setting, x relies on (or feels trust in) y if actions performed by y are useful to x.In our setting, x is a humanoid robot, and y is an online interaction partner that x delegates its actions by aiming to reduce cognitive load (i.e., cost of perceptual processing) experienced during a sequential pattern recall task.We note that the Pepper humanoid robot interacts with three interaction partners and evaluates their trustworthiness based on the cognitive load experienced in performing the task.
Human-robot trust is usually investigated in a single direction with human trust in robots receiving more attention.Some recent attempts on investigating reciprocal trust between humans and robots have started to appear [8], [9].The studies on human trust in robots, the dominant direction, focus on determining environmental, cognitive, and design factors that play an essential role in developing trust in artificial systems (e.g., [10] and [11]).The less explored direction aims at modeling robot trust in interaction partners who may be humans, robots, and simulated agents (e.g., [6] and [12]).
In this study, we aim to contribute to the second direction by focusing on robot-online instructor interaction and c 2023 The Authors.This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
robot-robot scaffolding.To be concrete, by proposing a robot trust model and running a set of robot-online instructor and robot-robot social interaction experiments, we aim to answer two critical questions that are often overlooked: (i) can robot trust be modeled based on signals that can be obtained from the internal workings of the robot cognitive system and (ii) how could a robot deploy the knowledge it obtained through interacting with a trustworthy partner for scaffolding a less capable robot?
To answer the first question, we leverage the link between cognitive load and trust in humans [13], [14].It has been repeatedly found that humans form trust in biological or artificial interaction partners, which reduces the cognitive load during the interaction.For example, Khawaji et al. [15] showed that humans form trust in interaction partners if they experience low cognitive load measured by using mouse movements in a text-chat environment to accomplish a cooperative game.The same connection between cognitive load and trust was also found when humans interact with artificial interaction partners [16], [17].Overall, based on these findings and the definition of trust given above, we put forward the following proposal: a robot should form a high degree of trust in the interaction partners who help reduce the cognitive load of the robot in a given task.
To test this proposal, we envision a sequential visual recall task where an agent must decide the visual patterns (image patches) to process among a set of alternatives.Processing visual patterns are modeled as running an auto-associative memory (AAM) that recovers a previously learned memory item based on the selected pattern.The computational cost of the recall is taken as the number of steps it takes for convergence.Thus, for an image patch that is close to an earlier learned one, the incurred cost is low.We consider this cost as the cognitive load of the agent and use it to form internal rewards (IRs) that the agent uses for computational cost-guided reinforcement learning.The experiments conducted for question (i) employed the Pepper humanoid robot where it selects the pattern to be processed (i.e., the action) by itself until the selected action leads to negative reinforcement which happens when the cognitive load incurred with the selected action is higher than the previous one.In this case, the robot requests help from an online instructor that it believes to be an oracle for the task to make a decision.Here, the robot aims to minimize the cost of perceptual processing by relying on its cognitive modules and the instructor's guidance.In this setting, three distinct experimental sessions are conducted with different types of instructors that can be labeled as reliable (true oracle), less reliable, and random.At the end of these experimental sessions, the robot is provided a free choice to select the trustworthy instructor based on the cumulative reward collected with each instructor.
To address the second question, we recruit the Pepper robot from the previous experiment as the caregiver and a less capable "infant" robot (Nao) to perform the same task as in question (i).Both the caregiver and infant robot are equipped with the same mechanism to perform the sequential visual recall task.The infant robot aims to minimize the cost of perceptual processing by making sequential decisions, whereas the caregiver robot provides scaffolding for the infant agent.As in question (i), when the infant robot selects a visual pattern associated with a high cognitive load compared to the last processed pattern, the infant robot requests help from the caregiver robot to make a decision.To this end, three caregiver agents with different cognitive abilities are designed to assess the effect of these abilities on the performance of the infant robot.To be specific, caregiver robot 1 is endowed with a simple Theory of Mind (ToM) module that enables the robot to build an internal representation of the infant robot behavior that it can use when help is asked.Additionally, caregiver robot 1 is also equipped with a task memory system that keeps the past experiences formed in the robot-online instructor experiments and allows the robot to query it to help the infant robot.Caregiver robot 2 is endowed with a task memory system without a ToM module.Finally, caregiver robot 3 has neither the ToM module nor task memory to help it guide the infant robot.Thus, it can only help the infant robot based on its own experience being built based on infant action decisions.
Based on the results obtained from experiments in response to questions (i) and (ii), this study offers the following contributions.First, we show that robot trust could be modeled based on the cost of perceptual processing akin to cognitive load in humans by generating an IR signal that can facilitate the assessment of the trustworthiness of the interaction partners.Second, we present that robot trust together with a simple ToM skill could be employed in the robot-robot scaffolding scenario to guide a perceptually limited infant robot to perform the same task, i.e., visual recall, effectively.Finally, we achieve these contributions by using actual humanoid robot platforms to show that our approach could be implemented in a real-life task where environmental constraints (e.g., noise) and hardware constraints (e.g., camera resolutions) are nonnegligible.
The remainder of this article is organized as follows.In Section II, we review related work on human trust in robots and robot trust in humans.In the same section, we also give an overview of scaffolding studies in robotics.Section III introduces the implemented modules that we employed on the caregiver (i.e., the Pepper robot) and the infant (i.e., the Nao robot) robots.The experimental setups for robot-online instructors and robot-robot interactions are described in Section IV.The experimental results and the online repository to reproduce the obtained results are presented in Sections V and VI, respectively.Finally, we present discussion in Section VII and conclusion of the study in Section VIII.

II. RELATED WORK
In this section, we review the factors that affect human trust in artificial systems, computational models of robot trust, and scaffolding studies in robotics, respectively.

A. Factors of Human Trust in Artificial Systems
The factors that affect human trust in robots and artificial systems have been studied by considering various dimensions, including the robot morphology, environmental context, and human factors [10].For example, some determinants of trust have been found to be the predictability and reliability of the robots, humans' cultural background and workload, which can be linked to the cognitive load of the human partner.
Daronnat et al. [18] addressed the link between the virtual agent's predictability and its effect on human trust.The assigned task in this study is a collaborative missile command game played with five agents with different performance and predictability skills.By combining the game statistics-such as the number of hits and total shots-and the outcomes of surveys, the authors suggested that interactions with highly predictable, low cognitive load, and more reliable agents led to high trust in virtual interaction partners.Gupta et al. [19] found that low cognitive load measured through an electroencephalogram (EEG) indicates the trustworthiness of the agent in virtual reality tasks: shape selector and n-Back recalling.Novitzky et al. [20] put forward the following hypothesis: reducing the cognitive load increases human trust in game partners.Although the study offers preliminary findings, the authors found support for the hypothesis by measuring the heart rate variability of the humans as an indication of cognitive load in a capture-the-flag game.Correia et al. [21] employed questionnaires to measure human trust in game partners, which could be humans or robots.The authors pointed out that behavioral response and emotions play a critical role in human-human trust.In addition, the performance of robot game partners was found to be one of the determinants of human trust in robots.Ahmad et al. [16] investigated a set of factors for human trust in robots, such as anthropomorphism and cognitive load.The authors measured cognitive load using pupillometry and carried out a matching pair card game by employing the Husky mobile robot and the Pepper humanoid robot.The authors argued that the high cognitive load of human partners led to low trust in robots.Desai et al. [17] designed a search and rescue task with iRobot ATRV-JR mobile robots.The participants guided the robot with different autonomy modes to achieve the task, i.e., avoiding obstacles and finding victims.At the end of the experiments, the participants were asked to fill questionnaires to assess the cognitive load and its association with trust.The authors concluded that a low cognitive load indicated high trust in robots.
Here, we have introduced selected trust studies from a human perspective to show that cognitive load is one of the critical factors in forming trust in artificial interaction partners.For more general literature on human trust in artificial agents, the readers are referred to [11], [13], and [21].

B. Computational Models of Robot Trust
Although most human-robot trust studies focus on the human perception direction, there are some studies looking at the other direction as briefly given next.Sorbello et al. [22] proposed a cognitive architecture that aims at endowing humanoid robots with cognitive modules and mechanisms to represent trust and evaluate the trustworthiness of human partners.Although the authors stressed that the architecture was designed for humanoid robots for social interaction, conceptual agents were used to explain the computational architecture formally.Patacchiola and Cangelosi [23] presented a Bayesian trust model together with a ToM module for simulated agents to categorize the reliable and unreliable informants as interactive agents.The results show that the trust model with the ToM module enhances agents' performance in detecting unreliable informants.Yang and Parasuraman [24] addressed robot trust by employing multiple heterogeneous robots to manage urban search and rescue tasks in a simulation environment.The authors proposed a multilevel trust model based on the hierarchical needs, such as battery level and the number of rescues, of robots and robot teams to derive relative entropy values-i.e., high Kullback-Leibler distance indicates low trust-for determining trust between agents, trust between agents and groups, and trust between robot teams.In the study [6], Patacchiola and Cangelosi introduced a comprehensive developmental cognitive architecture for robot trust.This architecture also hosts a ToM module.Here, the authors employed the iCub humanoid robot to replicate trust-related interactive psychological experiments, namely, object naming and sticker finding.In this study, the tasks include differentiating interaction partners as reliable and unreliable informants.We note that the same authors, together with colleagues, have extended the trust model in different settings, such as performing collaborative human-robot interaction tasks [25], [26].Chen et al. [27] presented a computational model that infers human trust by a robot in a collaborative task (i.e., cleaning a table).Here, the authors show that interacting with the robot that can infer human trust brings about better collaboration results by using cumulative reward to compare the results with the robot without a trust model.
Our trust model offers the following distinction compared to the studies introduced in this section.On the one hand, some of the robot trust models introduced here are implemented in a simulation environment that might ease the development of the implementation but do not answer whether the same model could achieve similar performance in actual robot experiments.In contrast, we implemented the proposed trust model on a humanoid robot to show that the model could accomplish a nontrivial task and could later be employed on social and collaborative tasks in a real-world setting.On the other hand, our trust model is based on the computational cost of perceptual processing (i.e., cognitive load: the amount of cognitive resources needed to perform the task) employed into a decision-making framework to extract the IR that enables the robot to differentiate the reliable interaction partners as the most trustworthy one.We note that our approach to establish the relationship between cognitive load and trust is in consonance with human trust studies that we introduced in Section II-A.Additionally, our trust model does not include complex cognitive concepts to form robot trust in an interaction partner; instead, we combine a simple ToM module with a trust component to perform scaffolding experiments.

C. Scaffolding Studies in Robotics
Robotic scaffolding studies in the literature can be broadly categorized into two groups.In the first group, the employed robot is a caregiver (or teacher) agent that tutors humans, often children, in an educational setting.For instance, Jones et al. [28] used the upper body of the Nao robot to tutor children for a map reading task with different conditions.The results show that employing the robot embodiment-i.e., Nao robot that can verbally interact with children-increases task engagement and trust.Kennedy et al. [29] used a Nao robot to interact with children for tutoring mathematics (i.e., learning prime numbers).The study shows that the children who interact with robots show significant performance in performing the task.Interestingly, the authors also pointed out that the social and adaptive behavior of the robot might adversely affect the learners' performance.In the second group of studies, a human caregiver scaffolds a robot to achieve a task in a setting that the robot could not accomplish by itself.For instance, Ugur et al. [30] employed a Motoman robot arm to accelerate learning of grasping through human scaffolding.The authors also showed that the robot could learn to grasp novel objects autonomously.Breazeal and Thomaz [31] used a social robot, namely, Leonardo, to show that the learning progress of the robot can be increased by receiving social scaffolding by humans to solve a puzzle box task.The results indicated that the robot performance improved by interacting with humans compared to solving the same task by itself.For a more general exposition of scaffolding that touches on constructive perspectives, philosophical foundations of affective scaffolding, and cognitive developmental robotic, the readers are referred to [32], [33], and [34].
Based on the presented studies in this section, we conclude that our approach for scaffolding is unique, and there is virtually no similar attempt in the literature that addresses robot trust in the context of robot-robot scaffolding.

III. METHODS
This section gives the implementation details of the cognitive modules of the robots that were employed in the experiments conducted.The details of the experiments are given in the following section.

A. Visual Memory Recall System
As a model of cognitive processing of a robot, we use the visual memory recall function.For this, we adopt the highorder Hopfield network [35] that acts as an AAM to recover stored patterns from an initial memory pattern as in our earlier studies [12], [36].This module, besides performing recall based on an initial visual pattern, returns the cognitive load associated with the pattern.The cognitive load is then converted in a reward signal that is used by the reinforcement learning system to find a policy for processing the visual patterns in sequence so as to minimize total cognitive load.
Fig. 1 shows the visual patterns that were used to construct associative memories of the robots.Before starting the experiments, we allow the robots to capture these patterns through cameras.The Pepper and Nao robots perceive the same visual pattern with different resolutions, i.e., 640 × 480 (Pepper) versus 320 × 240 (Nao).After capturing the visual patterns, an image preprocessing pipeline (i.e., extracting a region of interest, grayscaling, binarizing, and downsizing the patterns) was performed to obtain the bipolar encoding of patterns with  a size of 32 × 32.To employ the auto-associative network on the bipolarized (each pixel is ±1) patterns, the activation (i.e., output representation) of each unit i (i.e., a neuron) is given by the weighted sum of the products of the activations of all possible pairs of units as ( In ( 1), the sgn() function yields −1 for negative arguments and +1 for the arguments that are greater than or equal to zero.The weight matrix of the network W can be computed by using ( In this equation, ξ p i , ξ p j , and ξ p k are the ith, jth, and kth bits of the pth pattern ξ p where p runs through the number of patterns to be stored (in the experiments reported p = 1, 2, . . ., 5).
After constructing the weight matrices for each robot, the visual recalling task can be performed on the images by processing the patterns located in the visual scene.As can be seen in Fig. 2, in the experiments conducted, the visual scene is composed of 20 patterns in the form of a 4 × 5 grid where each cell in the grid has either one of the trained patterns, the noisy versions, or a randomly cropped version of the stored patterns.
Throughout interactive experiments, the robots select the id number associated with a specific cell in the grid through voice command (such as "Open 15"), and the pattern in that cell is displayed in a full-screen mode.Then, the agent captures Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
the displayed image (denoted as ξ) and performs the preprocessing steps given above to obtain the bipolar version of the images.The preprocessed image becomes the input for the memory recall system, and the network dynamics are run for visual recall.The network dynamics are run by asynchronously updating the units to reach a steady state (i.e., convergence state), shown as ξ .After performing the above steps, the network might converge to one of the stored patterns, the inverse of the stored pattern, or a combination of the stored patterns in the associative memory [37].
In this setting, we define the neural computational cost of recalling a memory pattern ξ , as the number of changed bits to reach a steady state, that is, the converged state of the network.Here, we link the number of changed bits with the computational energy, denoted as E(ξ ), required to perform visual recalling.This energy value for a visual pattern is then used to obtain an IR signal that guides computational cost-aware decision making.
To build a visual recall system for the robot, we used the Hopfield network, which has a limited capacity to retrieve patterns from AAM.However, it is possible to employ the alternatives (such as deep belief networks and its variants [38], [39]) for creating the same system with more storage capacity.
For the remaining part of this article, we refer to this energy value as cognitive load, i.e., the amount of computation required to perform visual recalling.

B. Internal Reward Module
This section presents the IR module that enables the robots to perform cognitive load-guided decision making for perceptual processing, i.e., visual recall.To implement this module, we adopt a model-free reinforcement learning algorithm, namely, SARSA, where the reward is generated internally based on the cognitive load incurred during visual recall of a memory pattern as in our preliminary work [40], [41].
The decision-making system of the robots is defined within the standard MDP framework [42], as a tuple (S, A, P, R) where S indicates the state space, A indicates a set of actions, P indicates a transition function, and finally R indicates an immediate reward function that is returned after each action of the system.The solution of an MDP is an optimal policy (π ) that describes what action to take in each state so that the agent maximizes the sum of the discounted rewards it can collect in the long run.In this study, the robots perform on-policy learning to reach a near-optimal policy, namely, they use the SARSA algorithm [42] with ε-greedy strategy for exploration (in the experiments reported in this article, ε = 0.3).The state s ∈ S is defined as the index of the grid where the miniaturized version of the visual patterns is placed (see Fig. 2).Thus, S consists of n s = 20 states in our experiments, i.e., S = {s i : i = 0, 1, 2, . . ., 19}.During the experiments, the state patterns are perceived through the camera of the robot, and then become inputs to the memory recall module.After the memory recall is complete, the cognitive load associated with the recall is returned.The action (a ∈ A) is performed by generating a voice command to select the next state.Here, the number of actions, n a , is equal to the number of states, n s , thus A = {a i : i = 0, 1, 2, . . ., 19} in our experiments.
The SARSA algorithm learns a state-action value function, Q through two step state-action experience as given where Q(s, a) refers to the value for the current state, s, and action, a, pairs.Similarly, Q(s , a ) denotes the value for the action a in the next state s .The μ variable is the step size (i.e., learning rate) parameter, γ is an adjustment factor that discounts expected future rewards.The μ and γ values are used as 0.7 and 0.4, respectively.We obtained these values in our previous study [43] through a grid search.The agent internally generates a reward value R(s, s ) for an s, s pair which is defined as the IR based on the relative cognitive loads required to process s and s as where ξ s and ξ s indicate the current and next state patterns corresponding to states s and s .The cognitive load values (i.e., the number of changed bits to reach a stable state) incurred for the visual recall operations based on these patterns are denoted by E(ξ s ) and E(ξ s ), respectively.The IR module positively reinforces the agent if it moves from the state associated with the higher cognitive load to the lower one.In the opposite case, the IR module will generate a negative reinforcement for the agent.It is worth noting that, in the sequential visual recall task considered in this study, there is no final goal state to be reached, thus the agent performs reinforcement learning in an infinite horizon setting, although we limit the number of iterations to report the performance of learning for computational tractability.

C. Theory of Mind Module
The ToM module is implemented on the caregiver robot 1, who is supposed to guide a less capable infant robot.The ToM module aims to capture the action preference of the infant robot through forming state-action-reward associations based on the actions of the infant robot.Note that we implement this module in a pragmatic way.However, the more biologically realistic implementation of the ToM module could also be employed in the same experiment to scaffold an infant agent.To be concrete, caregiver robot 1 observes the infant robot's behavior (i.e., state and action selection) to form a reward table, i.e., ToM matrix.Then, during the interaction, caregiver robot 1 first checks the ToM matrix to guide the infant robot.If the corresponding state-action pair is populated, the caregiver agent will make a decision for the infant robot based on the highest reward value associated with the action.Otherwise, caregiver robot will make a decision based on the trust component.This procedure was repeated each time after the infant robot requests help from the caregiver robot 1.
More formally, we define this module as a matrix T ∈ Z m,m where m is the number of states, i.e., 20.In the experiment, the caregiver robot populates the T matrix, where the initial values Fig. 3. Illustration of data flow and interaction among the modules.Here, the robot perceives one of the selected patterns, shown in Fig. 2, to perform perceptual processing to form a bipolarized input vector.Then, the preprocessed visual inputs are used by the visual memory recall system to recall a visual pattern from the AAM while deriving the cost associated (i.e., cognitive load) with visual recall.Next, the cognitive values of two consecutive steps are compared to compute IRs.If the IR module yields −1, the robot asks help from the interaction partner to select an action to process the next visual pattern.In case the reward is +1, the robot chooses an action by itself.This iterative process is repeated till the termination condition is reached.We note that this data flow diagram is valid for both robot-online instructor and robot-robot interactions experiments.are zeros.Throughout the experiment, the caregiver robot 1 updates the elements of the T matrix (T s,a ) by observing the infant robot's decisions, i.e., an action that enables the infant robot to visit a state.
In this setting, for instance, if the infant robot requests help from the Pepper robot after visiting the following state-action pair: (s = 1, a = 10), the value of that element in the ToM matrix will be negatively reinforced, i.e., T 1,10 ← T 1,10 − 1.We note that requesting help from the caregiver robot indicates that the infant robot has moved from a low cognitive load state to the higher one.In the case of the infant robot performing the visual recall task without asking help from the caregiver, the state-action pair will be positively reinforced, i.e., T s,a ← T s,a + 1.
This module enables the caregiver robot to simulate the infant robot's behavior for decision making.Here, the caregiver robot forms state-action-energy associations to guide the infant robot to minimize the required cognitive load for perceptual processing.

D. Trust Component
The trust component was constructed during robot-online instructor experiments and captures the knowledge of the robot on the task.Recall that the Pepper robot interacts with three online instructors having different guiding strategies.At the end of the experiments, we provide a free choice to select the trustworthy instructor by the Pepper robot.In this case, the Pepper robot chooses the online instructor with a reliable guiding strategy as the trustworthy one.To be more concrete, we used the Q matrix that was formed by interacting with the reliable instructor, which yields the highest cumulative reward, and was deemed as a trust component, i.e., a matrix that hosts reliable decision-making policies that help the robot to decrease cognitive load for perceptual processing.We illustrate the trust component in Fig. 7 as a heatmap.We then employ the same robot with a trust component as caregiver robot 1 and caregiver robot 2 to guide the perceptually limited infant robot.

IV. EXPERIMENTAL SETUPS
The experimental setups were grouped in two settings, namely, robot-online instructor and robot-robot interactions, introduced in the following sections.The Pepper robot was employed to interact with three instructors with distinct strategies in the robot-online instructor interaction setting.In the robot-robot interaction setting, the Pepper and Nao robots were recruited to be caregiver and infant robots, respectively.The overview of these experiments is depicted in Fig. 4.

A. Experiment Flow
Before starting the experiment, the robot-the Pepper robot in experiment 1 and the Nao robot in experiment 2-was placed in front of the monitor to capture visual patterns shown in Fig. 1 to form an AAM by following steps that were introduced in Section III-A.Note that this procedure could be considered a training phase of the experimental flow.
The experimental flow for a single iteration for the interactive experiment is illustrated in Fig. 3.
After performing the training phase, the interactive experiments start with perceiving a visual scene-illustrated in Fig. 2-with enumerated states in the form of a 4 × 5 grid.Here, each cell in the grid hosts a distinct visual pattern.The online instructors, the Pepper robot, and the Nao robot generate a voice command to select a state in the scene and interact with other agents in the experiment.For example, the voice commands generated by the Pepper and Nao robots were used to trigger certain behaviors to request help from online instructors or caregiver robots, respectively.
After requesting help or selecting a specific cell in the visual scene, the monitor displays the pattern in that cell in a fullscreen mode.The figure in the second row of Fig. 4 shows the presented visual scene and selected state pattern from one of the robot-robot interaction experiments.Once the pattern is displayed in a full-screen mode, the robots capture the image and then preprocess the image in that state to become an input of the AAM.Then, the cognitive load associated with the perceived patterns is extracted to generate an IR value.The steps that robots perform to extract cognitive load value for a visual pattern can be seen as the testing phase of the experiment.Based on the IR, the robots decide to perform visual recalling by themselves or request help from the interaction partners.For example, in the case of negative reinforcement, the Pepper robot will request help from an online instructor, whereas the Nao robot will request help from the caregiver robot.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.4 illustrates the designed experiments: robot-online instructor interaction and robot-robot interaction.In robot-online instructor interaction experiments, the Pepper robot interacts with three online instructors with different guiding strategies: reliable, less reliable, and random.These experiments were carried out to extract the trust component to be employed in the robot-robot interaction experiments.In the robot-robot interaction setting, we created three versions of the caregiver agent by selectively employing cognitive modules and the trust components to guide a perceptually limited infant agent, i.e., the Nao robot.The videos of the experiments can be found in the public repository of this article (see Section VI).In the second row of Fig. 4, the experiment images show the initial screen and the visual pattern displayed in full screen to the robot.Here, the initial screen enables the robot (and an interaction partner) to select a state among 20.Then, the visual pattern associated with that state, shown as a selected action pattern, allows the robot to perform a sequential visual recall task.We note that these sequential pattern selection and processing steps were iteratively executed till the termination condition is reached.

B. Robot-Online Instructor Interactions
In these experiments, we designed an interactive scenario in which the Pepper robot interactively performs visual recalling.To this end, the Pepper robot interacts with the preprogrammed online instructors that have different guiding strategies: reliable, less reliable, and random.The same setting as in study [41] was presented in this section to introduce its relation to the robot-robot interactions.Fig. 5 shows the Pepper robot, the monitor for displaying patterns, and a small monitor that guides the Pepper robot as an online instructor.
Here, the reliable instructor enables the Pepper robot to process the visual patterns that were associated with less cognitive load, i.e., one of the trained patterns that were used to form the robot's associative memories.The less reliable instructor, however, guides the robot to process the patterns with a high cognitive load, such as the patterns with noise or cropped patterns.Finally, the random instructor provides random guidance without considering cognitive load and visual pattern associations.
The Pepper robot performs an interactive visual recalling task for 500 iterations, which indicates a run, and we carried out ten runs with each instructor.After carrying out these interactive experiments, we provide a free choice to the Pepper robot to select the trustworthy instructor for performing the next visual recalling tasks.The Pepper robot selects the instructor that allowed it to obtain the highest cumulative reward, or spend the least cognitive effort, as a trustworthy one.Additionally, the Q matrix that formed with the trustworthy instructor is used as a trust component (see the last section

C. Robot-Robot Interactions
The interactive experiments introduced in this section were designed as a robot-robot scaffolding experiment in which a perceptually limited agent, a Nao robot with a low-resolution camera, is guided by the three caregivers with different cognitive modules and trust components.The caregiver types were shown in Table I.Here, the caregiver robot 1 has AAM, IR, and ToM modules with additional trust component, i.e., the Q matrix that was formed by interacting the trustworthy instructors in the robotonline experiment.The caregiver robot 2 also has AAM and IR modules with an additional trust component.Finally, the caregiver robot 3 has only AAM and IR modules.The infant robot in these interactive experiments was endowed with only AAM and IR modules.We designed these settings to benchmark the infant robot's performance in reducing the cognitive load while performing the visual recall task with different caregivers.
We note that the same experimental setting was introduced in our previous study [12].However, in this study, we repeated the experiment with different caregivers 1 and 2 for 5 more runs with 300 iterations, 10 runs in total, and we carried out the same experiments with the caregiver robot 3. Additionally, we performed the visual recall task on the infant robot by directly transferring the trust component to assess the added value of robot-robot scaffolding (see Section VII for a detailed discussion on this).
Fig. 6 shows one of the caregiver agents with the infant robot in the experiment.In this setting, the infant and caregiver robots perform visual recall by employing cognitive modules and trust components.As described above, each caregiver robot has a different set of modules or components to perform the task.The infant robot makes decisions (i.e., selecting state-action pairs) based on its Q matrix.If the infant robot moves from the state associated with low cognitive load to the higher one, it will generate a voice command to request help from the assigned caregiver agent to make a decision.Depending on the caregiver robot type, the infant robot receives different decisions.For example, the caregiver robot 1 will check its ToM matrix first to retrieve the best action according to the observed behavior of the infant.If there is no best option in the current state, it will guide the infant robot based on its Q matrix extracted at the end of interactions with the trustworthy instructor and updated during the experiment.The caregiver robot 2 will make a decision based on the trust component, which is updated throughout the experiment.Finally, the caregiver robot 3 performs the task by starting from scratch together with the infant robot and making a decision based on the Q matrix that was randomly initialized before the experiment.

V. RESULTS
In this section, we present the results of the two experimental settings: 1) robot-online instructor interaction to form trust in a reliable instructor and 2) robot-robot interaction in a scaffolding setting.The two experiment settings are connected by the fact that the Pepper robot uses the know-how it gained in the first set of experiments in the second set of experiments to scaffold the learning of a less capable infant robot (Nao).

A. Robot-Online Instructor Interactions
In these experiments, the Pepper robot interacts with three preprogrammed online instructors with different guiding strategies, namely, reliable, less reliable, and random, that guide the Pepper robot in case of help requests.Recall that the reliable instructor provides decisions to the Pepper robot that were associated with less cognitive load, i.e., the patterns in Fig. 2 with state ids of 0, 1, 9, 11, and 13.The less-reliable instructor, however, guides the robot to process any states except 0, 1, 9, 11, and 13.Finally, the random instructor provides random guidance without considering cognitive load and visual pattern associations.
The results presented here were adapted from our previous study [41] to show how the trust component is derived and embedded in robot-robot interaction experiments.The Pepper robot performs the interactive visual recalling for 500 iterations-that is, a run-and we repeat ten runs with each instructor.Fig. 5 depicts the Pepper robot, a large monitor for presenting visual patterns, and a small monitor that displays guidance (i.e., partners' decisions) to the Pepper robot as an online instructor.
After the interactive visual recalling experiments with online instructors, we collected the statistics of the Pepper robot about each instructor to make a free choice in order to decide which online instructor is more trustworthy than the others.This choice is made by considering the entries in Table II.The columns of this table, respectively, present minimum, maximum, and standard deviation values about the collected rewards grouped with instructor type.Here, the last column hosts the number of help requests (i.e., interaction) made by the Pepper robot.
The entries in Table II show the Pepper robot achieves the highest cumulative reward and average cumulative reward Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.while interacting with the reliable instructor.Additionally, these statistics were obtained by requesting a fewer number of help requests from the instructors that indicate the Pepper robot successfully formed state-action-cognitive load associations to perform sequential visual recalling more efficiently compared to the experimental results obtained while interacting with the instructors that provide less-reliable and random strategies.We note that the results presented in Table II are in line with our trust definition: the cognitive agent (i.e., Pepper robot) with an explicit goal of performing a sequential visual recall task will form a degree of trust in interaction partner that will reduce cognitive load incurred during the task.
To transfer the trust component to the caregiver robot 1 in robot-robot interaction experiments, we used the Q matrix of one of the runs that were formed by interacting with the reliable instructor and yielded the highest cumulative reward.Fig. 7 illustrates the trust component in the form of displaying the Q matrix in a heatmap format.Here, the darker blue elements indicate more valuable state-action pairs regarding the cognitive load of the Pepper robot.
Based on the Q matrix shown in Fig. 7, the most valuable actions (or next state) can be extracted for a given current state.The best state-action pairs will often enable the robot to generate positive IRs by moving from a high-energy state to the lower one.For instance, if we assume that the Pepper robot enters state 19, it will take action that leads to state 13. 1) Link Between Q Matrix and Trust: Q tables are formed through interacting with partners with different strategies: reliable, nonreliable, and random.Overall, these tables are populated by having a direct impact on interaction partners' decisions (i.e., action suggestions followed by the learner robot) on updating the Q(s,a) function.To be concrete, with the Q table that was formed for the reliable interaction partner, the learner robot (i.e., the Pepper robot in experiment 1) collects more rewards and experiences less cognitive load during the experiment.Since, at the end of the experiment, a free choice is given to the robot to select the most trustworthy instructor, we deem that this Q table formed through interacting with the reliable online instructor can be seen as a trust component of the agent.Since the Pepper robot outsources some of its actions to perform the task with less cognitive load, we emphasize that this approach aligns with our definition of trust that introduced in Section I.

B. Robot-Robot Interactions
In this section, we introduce the results for robot-robot interactions.We note that the experiments for robot-robot interactions were designed in a scaffolding setting where the caregiver agent, i.e., the Pepper robot, guides a perceptually limited infant robot, i.e., the Nao robot with a low-resolution camera.The caregiver robot and the infant robot were endowed with AAM and IR modules.We also selectively employ a simple ToM module and trust component to the caregiver robots, see Table I.We note that the trust component is the Q matrix of the Pepper robot (shown in Fig. 7) that was formed in the robot-online instructor interaction experiments by interacting with the trustworthy instructor.
Table III presents the statistics about the infant robot's performance in terms of accumulated cumulative reward and the number of interactions (i.e., the number of help requests by the infant from the caregiver robot).The columns of this table provide the minimum, maximum, standard deviation, and average of the reward values, respectively.The last columnlabeled as Reqs.-indicates the total number of interactions (or help requests) with each instructor.We noted that the entries of this table were constructed after the experiments for 300 iterations, which indicates a run, and we repeat ten runs with each caregiver robot.According to the entries in this table, the infant robot collects the highest average cumulative reward with the caregiver robot 1 in performing fewer help requests.In this vein, the highest reward was also collected while interacting with the caregiver robot 1.The standard deviation values for each instructor are high because the experiments were carried out with real robots in an uncontrolled environment (e.g., noisy camera readings) and stochasticity in the performed methods such as random action selection due to ε-greedy.Overall, the infant robot learned the environmental dynamics (i.e., stateaction-cognitive load associations) in a useful way to increase cumulative reward while interacting with the caregiver robot 1 that has a ToM module and trust component-compared to the other caregiver agents.
To visually illustrate the infant robot's performance and the effect of scaffolding with each instructor, we depict the average cumulative reward curves of the infant robot in Fig. 8. Here, the blue, green, and red colors, respectively, indicate the infant robot's reward curves while interacting with caregiver robots 1-3.
As can be seen in Fig. 8, the increment trend of the blue curve is higher than the others that shows the infant robot learns state-action-cognitive load association for visual recalling by frequently moving from a high cognitive load state to the lower one.Overall, the interaction with the caregiver robot 1 yields a higher cumulative reward than the other caregiver robots.We interpret this outcome as endowing the caregiver robot with the ToM module, and the trust component is critical for scaffolding an infant robot to perform visual recalling efficiently.
To further analyze the infant robot's performance with each caregiver robot.We record the average absolute temporal difference error based on the following equation: In (5), td error refers to the temporal difference error formed by using the difference between the value of the current stateaction pair in the Q matrix, Q(s, a), and the value of the next state-action pair, Q(s , a ), with the contribution of IR value, R(s, s ), while moving from the current state to the next state.The absolute values of td error were extracted throughout the experiments with 300 iterations and 10 runs.As shown in Fig. 9, to illustrate the increment and decrement trends in a single curve for each robot-robot interaction experiment, we took the average temporal difference error curves, then the curve was smoothed with a window size of 150.
The blue, green, and red curves in Fig. 9 indicate the average absolute temporal difference error of the infant robot by interacting with caregiver robots 1-3, respectively.The blue curve, associated with caregiver 1, shows a better reduction trend (i.e., attains a better policy) than red and green curves.This behavior indicates the learning progress (i.e., learning the state-action pairs associated with less cognitive load and, therefore, positive reinforcement) while interacting with the caregiver robot 1.This finding also indicates that the caregiver agent with a simple ToM module and trust component efficiently scaffolds the perceptually limited infant robot to perform a sequential visual recalling task.
On the basis of the presented results from the robot-online instructor and robot-robot interactions, we provide the following conclusions.We note that the results presented here are also in line with our previous studies [12], [41].
On the one hand, in robot-online instructor experiments, the Pepper robot can perform a sequential visual recalling task more efficiently while interacting with the reliable instructor.The Pepper robot achieves selecting the trustworthy instructor (or informant) after interacting with all online instructors for an equal number of runs and iterations.Additionally, we highlight that the cumulative average reward value is an appropriate metric, among other metrics such as the number of interactions, to determine the overall performance of the Pepper robot with online instructors.We suggest using this metric due to its relationship with cognitive load (i.e., the action leads to moving from a low cognitive load state to the higher one will be negatively reinforced) and the contribution of both the Pepper robot and online instructors to obtaining positive reinforcement throughout the experiment.
On the other hand, in robot-robot interaction experiments, we show that the trust component-i.e., the Q matrix of the Pepper robot formed through interacting with the trustworthy instructor-together with the ToM module plays a critical role in scaffolding setting to guide a perceptually limited infant robot to perform a visual recalling task.To be concrete, the interaction with a caregiver robot with a trust component alone enables the infant robot to collect more cumulative rewards Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
than the caregiver agent with no trust component, i.e., caregiver robot 2 versus caregiver robot 3.More importantly, the interactions with the caregiver agent with a trust component and additional ToM modules lead the infant robot to substantially increase cumulative reward compared to the other caregiver agents, i.e., caregiver robot 1 versus other caregiver robots.

VI. REPRODUCIBILITY OF THE STUDY
To reproduce the presented results and provide all the related data-including scripts, parameters, figures, and images-to other researchers, we use a public repository. 1 The same repository can also be used to watch the experiment videos.

VII. DISCUSSION
Most of the work on trust between interacting agents involving robots considers mainly the human perception of robots and aims at uncovering the correlates of human trust in terms of robot design and behavior.Although these studies provide insights on the nature of human trust, they do not directly offer insights as to how trust can be modeled for or acquired by robots.In this study, we explored how a robot can use the internal computational burden, akin to human cognitive load, to assess an interaction partner's trustworthiness.Without rejecting the possibility and efficacy of other complementary mechanisms for robotic trust formation, we argue that our results offer interesting and novel research directions for robotic trust modeling.In particular, having a robot guided/tutored by a human to tutor/guide another robot is intriguing.The generalization of this idea to arbitrary tasks may pave the road for effective robot teaching and open-ended robot-culture development.One interesting question that arises here is whether we gain anything by letting robots teach each other in a social way, similar to how humans teach each other.From an engineering perspective, one can "copy" the learned knowledge from a robot and embed it in another robot directly.Another nonhuman way of teaching can be envisioned where robots use, e.g., wireless communication, to "transfer" knowledge to each other.Obviously, the latter approach would fail if the robots or their sensors differ significantly.The designer copy approach may work, but it would require valuable human labor for each task knowledge transfer.
To understand how the nonsocial copy approach would work in the task considered in this article, we designed an additional experiment.Here, instead of allowing the infant robot to learn the task by a scaffolding caregiver robot, we transferred the knowledge of which pattern to process next to minimize overall cognitive load, i.e., the action policy directly to the infant robot.As this policy was tuned for the caregiver robot, there is no guarantee that it would yield the same cognitive load levels for the infant robot even though both robots have the same visual processing system (an AAM).This is because the robots are equipped with cameras that have different hardware properties and noise levels.The experiments with ten repetitions to account for variability in reinforcement learning and 1 www.github.com/muratkirtay/TCDS2022camera noise yielded this ordering for the overall cognitive load on the infant robot: learning with caregiver robot 1 < learning with caregiver robot 2 < direct policy copy < learning with caregiver robot 3. So, indeed the direct-copy approach did not perform as well as the social teaching scenario but still performed decently since the infant and the caregiver had the same visual processing mechanisms.
Based on this preliminary finding, we can suggest that with different robots in multiple and complex task learning domains, robot-robot learning that follows a pattern similar to human-human learning may be beneficial.Thus, we argue that robot-to-robot social learning is a new rich venue waiting for interdisciplinary scientific investigation.

VIII. CONCLUSION
We have proposed that robot trust in interaction partners can be determined based on the need to minimize cognitive load and implemented the proposal over an interactive visual recalling task.To be concrete, we first conducted a robot-online instructor interaction experiment set in which the Pepper robot interacted with instructors having three guiding strategies: 1) reliable; 2) less reliable; and 3) random.At the end of these experiment sessions, we showed that the robot attained the ability to detect the reliable instructor as the trustworthy one.
We further designed a set of robot-robot scaffolding experiments, where the "able" robot, Pepper, guides a perceptually limited "infant" robot, Nao, by bringing the knowledge obtained as the result of its interaction with a trustworthy instructor as a caregiver.To benchmark the performance of the infant robot as a function of the cognitive abilities of the guiding robot, we designed three versions of the scaffolding robot with different sets of cognitive abilities.To be concrete, the caregiver robot 3 did not actually use the knowledge it gained from the experiments with online instructors but learned in parallel with the infant and offered help with this knowledge.The caregiver robot 2 did bring in the knowledge to scaffold the learning of the infant robot.Finally, the caregiver robot 1 was equipped with a simple ToM module on top of what caregiver robot 2 had.In addition to social scaffolding with these three types of caregivers, in a separate experiment, we bypassed the social guidance and instead directly copied the knowledge from the caregiver robot 2 to the infant robot to assess the benefit of social scaffolding in robots.
Overall, the results from the experiments outlined above allow us to state the following arguments.First, we realize an analog of the findings in humans that relate cognitive load and trust on robots through the computational load of internal processing of a robot as a reward signal.Second, we show that the knowledge obtained from trustworthy partners can facilitate effective social robot scaffolding, i.e., robot-to-robot teaching, which can be further improved by additional cognitive modules.Finally, we show that having robots teach each other, i.e., social robot learning is more effective than directly transferring knowledge among robots with different properties.

Fig. 1 .
Fig. 1.Visual patterns stored on the Pepper and Nao robots to form an AAM.(a) Shopping cart, (b) globe sketch, (c) wireless symbol, (d) spider web, and (e) human figure.

Fig. 2 .
Fig. 2. Visual scene presented to the robots for perceptual processing.

Fig. 4 .
Fig.4.First row of Fig.4illustrates the designed experiments: robot-online instructor interaction and robot-robot interaction.In robot-online instructor interaction experiments, the Pepper robot interacts with three online instructors with different guiding strategies: reliable, less reliable, and random.These experiments were carried out to extract the trust component to be employed in the robot-robot interaction experiments.In the robot-robot interaction setting, we created three versions of the caregiver agent by selectively employing cognitive modules and the trust components to guide a perceptually limited infant agent, i.e., the Nao robot.The videos of the experiments can be found in the public repository of this article (see Section VI).In the second row of Fig.4, the experiment images show the initial screen and the visual pattern displayed in full screen to the robot.Here, the initial screen enables the robot (and an interaction partner) to select a state among 20.Then, the visual pattern associated with that state, shown as a selected action pattern, allows the robot to perform a sequential visual recall task.We note that these sequential pattern selection and processing steps were iteratively executed till the termination condition is reached.

Fig. 5 .
Fig.5.Experimental setup for robot-online instructor interactions.Note that the small monitor on the left displays online interaction partners' information (e.g., action suggestions) during the experiment.

Fig. 7 .
Fig. 7. Q matrix of the Pepper robot after it has interacted with the reliable instructor that yields the highest cumulative reward.This matrix is considered as the trust component for the robot-robot interaction experiment.

Fig. 8 .
Fig. 8. Average cumulative reward curves of the infant robot after interacting with the caregiver robots for ten runs and 300 iterations.

Fig. 9 .
Fig. 9. Average absolute temporal difference error curves of the infant robot while interacting with the caregiver robots for ten runs.

TABLE I COGNITIVE
COMPONENTS OF CAREGIVER AND INFANT ROBOTS.AAM, IR, TOM, AND TC STAND FOR AUTO-ASSOCIATIVE MEMORY, INTERNAL REWARD, AND THEORY OF MIND MODULES, RESPECTIVELY.TC INDICATES WHETHER THE AGENT HAS A TRUST COMPONENT Fig.6.Experimental setup for robot-robot interactions.In this setting, the Pepper robot is a caregiver agent, and the Nao robot is an infant robot with limited perceptual processing.

TABLE II STATISTICS
OF THE PEPPER ROBOT WHILE INTERACTING WITH THE ONLINE INSTRUCTORS THAT HAVE RELIABLE, LESS RELIABLE, AND RANDOM GUIDING STRATEGIES.THE MIN, MAX, STD, AND AVG COLUMNS SHOW THE STATISTICS RELATED TO COLLECTED REWARDS, WHILE REQS.COLUMN INDICATES THE NUMBER OF HELP REQUESTS (I.E., INTERACTIONS) MADE BY THE PEPPER ROBOT TO THE ONLINE INSTRUCTORS

TABLE III INFANT
ROBOT'S STATISTICS AFTER THE EXPERIMENTS FOR TEN RUNS WITH 300 ITERATIONS.THE MIN, MAX, STD, AND AVG COLUMNS SHOW THE STATISTICS RELATED TO COLLECTED REWARDS, WHILE REQS.COLUMN INDICATES THE NUMBER OF HELP REQUESTS (I.E., INTERACTIONS) MADE BY INFANT ROBOT TO THE CAREGIVER ROBOT