Explaining Intelligent Agent’s Future Motion on basis of Vocabulary Learning with Human Goal Inference

Intelligent agents (IAs) that use machine learning for decision-making often lack the explainability about what they are going to do, which makes human-IA collaboration challenging. However, previous methods of explaining IA behavior require IA developers to predefine vocabulary that expresses motion, which is problematic as IA decision-making becomes complex. This paper proposes Manifestor, a method for explaining an IA’s future motion with autonomous vocabulary learning. With Manifestor, an IA can learn vocabulary from a person’s instructions about how the IA should act. A notable contribution of this paper is that we formalized the communication gap between a person and IA in the vocabulary-learning phase, that is, the IA’s goal may be different from what the person wants the IA to achieve, and the IA needs to infer the latter to judge whether a motion matches that person’s instruction. We evaluated Manifestor by investigating whether people can accurately predict an IA’s future motion with explanations generated with Manifestor. We compared Manifestor’s vocabulary with that from optimal acquired in a situation in which the communication-gap problem did not exist and that from ablation, which was learned with a false assumption that an IA and person shared a goal. The experimental results revealed that vocabulary learned with Manifestor improved people’s prediction accuracy as much as with optimal, while ablation failed, suggesting that Manifestor can enable an IA to properly learn vocabulary from people’s instructions even if a communication gap exists.


I. INTRODUCTION
The development of machine-learning methods has allowed intelligent agents (IAs) to learn complex decision-making.Deep reinforcement learning (DRL) has broadened the applicability of IAs.However, while a growing number of studies are beginning to tackle problems when IAs mix with human The associate editor coordinating the review of this manuscript and approving it for publication was Anandakumar Haldorai .society, there is still much room for improving the quality of interaction between humans and IAs.
This paper focuses on explaining what an IA is going to do.It is difficult for non-experts to understand an IA's complex decision-making process in a machine-learning module [1]; as a result, people become unable to predict the IA's behavior.Unpredictable behavior can cause unintended results or accidents.Moreover, for IAs to effectively work with people, both a person and IA should be able to understand each other's future behavior to decide roles to take in each situation [2].Methods were proposed for generating explanation of motions that an IA will show.Hayes et al. proposed a natural-language question-answering system that provides an explanation of what an IA does in a particular situation [1].Waa et al. proposed a method for explaining not a one-shot action but a sequence of actions [3].
A problem with previous methods is that they require IA designers to predefine vocabulary that expresses an IA's behavior.Predefining vocabulary by hand is relatively easy in a simple environment such as grid world [4].However, when an IA deals with robot motor control, for example, decision-making can be highlyfrequent, high-dimensional, or sustaining time delay.We cannot simply correspond an IA action with a specific expression, which makes defining vocabulary much more complex.
We propose Manifestor, a method for explaining what an IA is going to do by autonomously enabling the IA to learn vocabulary that expresses its motions (Fig. 1).Manifestor enables an IA to learn vocabulary from instructions that people give to the IA.This setting is analogous to the instructionfollowing framework [5]- [7], in which an IA aims to learn a policy, or how to act, to follow a given instruction.Instruction following is typically formalized as an RL problem; that is, an IA earns more reward when its action fits more to the instruction.
As well as the difference in not at generating motions but explaining an IA's motions, a significant point of Manifestor lies in what we call the communication gap.In this paper, a communication gap refers to a problem in which the goal that a person wants an IA to achieve can be different from that of the IA, and the IA does not know which the person has.More specifically, an IA does not know which reward function is behind a person's instructions.Unlike instruction following in which an IA obtains reward feedback on whether its motion follows an instruction, an IA requires a meta-inference of the person's goal to learn the correspondences between its motion and vocabulary used in an instruction.Human-human interaction typically contains a communication gap because each person has her/his goals or intentions, and such mental states are more or less uncertain.A communication gap can also arise between an IA and person particularly if the person is not familiar with the design of the IA's decision-making.
Figure 1 illustrates our main idea.Manifestor solves the problem of vocabulary learning with communication gap as two inferences: (i) inference of a human goal allows an IA to learn vocabulary in a manner similar to instruction following.(ii) By comparing a human instruction with a classification result of an IA motion by the learned vocabulary, an IA can estimate which goal the human has, that is, when an IA recognizes that its motion matches a human instruction, a human goal is likely the one with which the motion earns more rewards.Manifestor enables an IA to learn vocabulary using two loss functions that represent each of the statements above.
This paper reports the results of experiments conducted to evaluate Manifestor.A numerical experiment focused on confirming the basis of Manifestor, and a user study experiment aimed at investigating whether explanation generated with Manifestor can improve the predictability of an IA's future motion.We compared Manifestor with two alternatives: optimal is trained in situations in which a person always gives instructions on the basis of an IA's true goal, so the IA does not need to consider the communication gap, corresponding to the instruction-following setting.ablation is trained with a false assumption that a communication gap did not exist.As a result, Manifestor showed similar performance as optimal, while ablation failed, suggesting that even if a communication gap exists, Manifestor enables an IA to correctly learn vocabulary and effectively explain its future behavior.
This paper is structured as follows.Section II describes the background of Manifestor from the perspective of both the explainable artificial intelligence (XAI) problem and vocabulary learning.We also formalize the communication-gap problem.Section III explains the design of Manifestor for learning vocabulary in situations with a communication gap.Section IV describes the details of our implementation for evaluating Manifestor.Section V reports the results of the numerical and user study experiments.Section VI discusses the limitations and future directions of Manifestor.Finally, Section VII concludes this paper.

II. BACKGROUND A. GOAL-ORIENTED XAI
An XAI refers to an intelligent system that can explain its decision-making to people [8].Some machine-learning models, particularly deep learning models, are composed of variables that people cannot easily interpret, which increases the need for XAI.Adadi and Berrada pointed out that XAI stems from at least four motivations: to justify AI decisionmaking, make AIs controllable, improve AIs, and enable people to discover insights from machine learning results [9].
The XAI problem is also critical for IAs that autonomously learn their policy with machine-learning methods.XAI for IAs is specifically called goal-oriented XAI [10] or explainable agency [11].Goal-oriented XAI is necessary for people to control or improve an IA and avoid unintended behavior.It is also concerned with whether people can trust an IA [12], [13].
Puiutta & Veith applied the taxonomy of XAI [9] to goal-oriented XAI [14].One factor is whether a method is intrinsic or post-hoc.Intrinsic methods are used for building a machine-learning model that is constitutionally interpretable.By using a decision-tree model or attention mechanism [15], it is easier for people to interpret an IA's decision-making process.Certain methods add constraints to a deep-learning-model structure so that decision-making models explicitly have human interpretable variables such as goals [16], [17].Post-hoc methods, however, focus on generating explanations of incomprehensible models after training.
Although post-hoc methods are not guaranteed to explain the literal decision-making process of an original machinelearning model, they do not affect model performance.
Another factor is whether an explanation is global or local.Global explanation provides a summary of an IA's general behavior [1], [18] whereas local explanation targets behavior in a specific situation.A major approach for local explanation is using a target model's saliency map, a visualization of input factors that strongly affected the model's decisionmaking [19]- [21].Saliency maps provide the reason an IA took a specific action and can be a clue for people to predict IA behavior [20].
Manifestor provides post-hoc and local explanations.It focuses on generating explanation of an IA whose model has little interpretability for people.We consider a specific motion that an IA is going to show so that people can correctly predict the future.

B. EXPLAINING IA FUTURE ACTION
Most XAI studies focus on explaining why a decision is made, and little consideration has been taken for explaining what the decision will be.However, because an IA's action can cause unrecoverable effects, including physical changes in the environment, it is important to be able to explain its action before taking it.Explaining what an IA is going to do is also essential for cooperation with humans, because effective cooperation is based on mutual understanding of what others will do [2].
Hayes et al. proposed a question-answering system for explaining an IA's behavior [1].It can answer a question of what an IA will do under specific circumstances.Strictly speaking, this is a global explanation based on a Markov decision-process model.Waa et al. proposed a method for explaining not a one-shot action but a sequence of actions [3].
However, they focused on an IA in a grid world, and challenges remain for applying it to another domain.A major challenge is that an IA's action is assumed to be easily associated with a symbolic expression for explaining to people.It becomes difficult to define vocabulary since an IA's decision-making is complex, making autonomous learning of vocabulary more promising.

C. ENABLING IA TO LEARN VOCABULARY
A simple machine-learning approach for grounding vocabulary with motion is supervised learning using a dataset of motion-caption pairs.Methods have been proposed to classify human activity using RGB (red, green, blue) cameras or depth sensors into caption labels [22]- [24], and a study focused on robot-motion captioning [25].
Typical trials for an IA to interactively learn vocabulary from people are in the instruction-following framework [5]- [7], [26], in which an IA seeks a policy for instructions given by people.Particularly in a reward-based approach [27], [28], an IA learns policy π instruction that maximizes expected reward given the environment state s t and instruction u t with RL methods: where r t is a reward given at time t, and γ is a discount rate for future rewards.On the boundary between XAI and instruction following, Shu et al. proposed a hierarchical RL model that improves interpretability of the learned behavior by associating sub-policies with vocabulary used in an instruction [17].
Instructions from people can be also used to boost an IA's action learning in which an IA can solely achieve its goal without instructions.Interactive RL (IRL) aims at enabling an IA to quickly learn its policy from people's symbolic feedback [29].Therefore, we consider a situation in which a person mentions how an IA should act.
In reward-based instruction following, it is assumed that an IA and person giving instructions share goal g ∈ G.That is, an IA's goal g agent is the same as a person's goal g human .Here, g agent is a variable that specifies the reward r for an IA's action in a specific environment status: r = R(s, a, g agent ). ( We call R a reward function and g human a variable that is behind a person's instruction: In instruction following, larger rewards are given to an IA when its action more matches a person's instructions.However, when a non-expert person attempts to communicate with an autonomous IA, s/he can mention something other than g agent , or g agent = g human , because s/he may not know exactly which goal the IA has or want the IA to work on another task.This paper focuses on such a communication gap to correctly interpret instructions given by people. Manifestor is an extension of our previously proposed prototype method called Instruction-based Behavior Explanation (IBE) [30], [31].IBE also uses vocabulary used in instructions from a person for explaining an IA's future motion and empirically demonstrated that its explanation improves the predictability of IA behavior.The largest significance of Manifestor over IBE is that it handles the communication gap between a person and IA, whereas IBE does not.Moreover, Manifestor quantitatively formalizes vocabulary learning with two loss functions while IBE relies on manual design of thresholds for determining whether an IA behavior matches a given instruction.
This paper tackles an extreme case in which a person only provides instructions and does not provide any other feedback such as whether the motion matches an instruction.This is not realistic for actual application because feedback boosts the learning process, but we chose this case to explore the possibility of vocabulary acquisition with as little additional information from a person as possible.

D. LunarLander-v2 AND INSTRUCTIONS
We used LunarLander-v2 provided in Open AI Gym [32] as a task for which an IA acts and in which a person gives instructions.An IA aims to land a rocket on an objective landing spot by manipulating main and side thrusters located on the rocket's bottom and left and right sides, respectively.A possible action a ∈ A is choosing which thruster to ignite to accelerate or decelerate the rocket.An IA can choose no thruster as well, with which the rocket moves in accordance with gravity and inertia.The environment state s ∈ S represents the rocket's location, velocity, degree of tilt, with which an IA choose its action.Rewards are calculated on the basis of the distance to a landing spot, deceleration, and decrease in tilt for each time step.One-shot positive/negative reward is given when the rocket succeeds/fails in landing.
In this paper, goals correspond to the location of a landing spot.We modified LunarLander-v2 so that we could change the landing-spot location for each trial, whereas the original has a fixed landing point at the center of the ground. 1ompared with a grid world, an action of LunarLander-v2 is not intuitive for people [33], so it is difficult to correspond an IA action with human vocabulary.One reason is that an action causes different results depending on the s.For example, the effect of rocket ignition on its velocity depends on the rocket's angle.In addition, a rocket behavior that people can recognize is not based on a single action but a sequence of actions because an action is chosen at high frequency (20 ms), which sustains time delay.
Following previous studies [30], [31], we defined a simple rule of generating an instruction for an IA: where s.x is the location of the rocket on the horizontal axis, and g.x left and g.x right represent the left and right end of the landing spot g, respectively.An instruction is based only on the locations of the rocket and human goal, and inertia of the rocket is not taken into account.A person is assumed to consistently give instructions with g human , which is fixed in an episode.In LunarLander-v2, an episode is from a rocket beginning to land to completing the landing.

III. MANIFESTOR A. OVERVIEW
Manifestor is a method for explaining an IA's future motion by learning vocabulary from people.It generates future-motion explanation by predicting the transition of environment states caused by an IA and translating it to human vocabulary.Manifestor interprets a person's instruction about how s/he wants an IA to act while inferring her/his goal and grounds vocabulary used in instructions.Therefore, it becomes unnecessary for IA designers to manually define vocabulary for explanation.

B. MODULES
Manifestor is composed of five modules: policy π g , predictor T N , translator f N , instructor model M , and evaluator E (Fig. 2).π g determines an action a on the basis of s under g in the same manner with typical RL methods.The T N predicts a sequence of actions (a t,N = (a t , a t+1 , . . ., a t+N )) and transitions in environment state (s t,N = (s t , s t+1 , . . ., s t+N )) in N steps on the basis of π g .
T N (π g , s t ) = (s t,N , a t,N ).
The f N outputs a probability distribution that represents the correspondence between transition s t,N and vocabulary u: The M infers a person's goal g human from her/his instructions: where τ is the length of an episode, and u 0,τ = (u 0 , u 1 , . . ., u τ ) is a sequence of instructions in an episode.This formalization is based on the assumption that g human is fixed in an episode.Finally, the E represents to what extent a transition and sequence of actions caused by the IA policy (s t,N , a t,N ) fits a g: We focus on the training of the f N and M but do not discuss the problems with T N or E such as how to train it and prediction accuracy.

C. LOSS FUNCTIONS
We define loss functions for training the f N and M .These loss functions represent the two ideas shown in Fig. 1.That is, we can correspond instruction vocabulary and a motion in a similar manner if the goal behind the instruction is given, and the goal behind an instruction can be inferred on the basis of how much a self-classification result of an IA motion matches the instruction.These ideas are used for training f N and M , respectively.

1) TRANSLATOR f N
Let us first consider a situation in which a person and IA share the goal, that is, g agent = g human .A person provides instruction u t on the basis of s t and g human (cf.Eq. 2).An IA chooses actions afterwards for N steps, and we obtain a sequence of actions a t,N and an environment transition s t,N .We define a loss function for the f N , i.e., L f N : The f N is trained to minimize L f N .With this loss function, s t,N more strongly corresponds to u t the more (s t,N , a t,N ) accords with g human .The E is based on rewards R(s, a, g) that will be given when assuming each possible goal.
Next, let us suppose a situation with a communication gap, or g agent = g human .We extended the former loss function as follows: where L + f N depends on a person's goal inferred by M .The (M • E) in Equation 5corresponds to E(g human |s t,N , a t,N ) in Equation 4. It takes into account all g ∈ G as a possible human-goal candidate because g human is hidden from the IA.The sum of M •E is the expected value of E(g human |s t,N , a t,N ) when we consider g human as a random variable.

2) INSTRUCTOR MODEL M
Equation 6 shows the loss function for training the M , i.e., L M : which expresses our idea that (a) when self-recognition of an agent's motion matches an instruction, (b) a human goal should be the one with which the motion earns large rewards.The terms f N and (E • log M ) represent (a) and (b), respectively.We focused on the co-occurrence of (a) and (b).That is, (E • log M ) should be maximized (or − (E • log M ) should be minimized) when f N is large.Equation 6 expresses the co-occurrence but has a loophole.It can also be minimized by decreasing the two terms individually.Therefore, we added β for a penalty factor to avoid this loophole and focus on the co-occurrence: We do not consider β = 0 because it cannot not theoretically occur when we use the softmax function for the outputs of f N , E, or M .

IV. IMPLEMENTATION A. TRAINING PROCEDURE
The f N and M require each other's inference for training (Eqs.5 and 6).Namely, their training is interdependent, and we could not stabilize the learning results when simultaneously training the two in our trials.To focus on validating our formalization of L + f N and L M , we simplified the learning process by introducing assumptions and splitting the training process into three phases.
In the first phase, instructions are divided into n groups with an unsupervised learning method regardless of L M .Specifically, we used the encoder-decoder model: where ĝ ∈ Ĝ is the result of the unsupervised learning method.
The second phase is training the f N on the basis of ĝ.The unsupervised learning method does not provide the relationship between ĝ ∈ Ĝ and g ∈ G, so there can be multiple combinations.Let us consider a mapping m : Ĝ → G. Therefore, we can build the M .M m (g|s 0,τ , u 0,τ ) = ĝ∈ Ĝ δ(g, m(ĝ)) • Encoder(ĝ|s 0,τ , u 0,τ ), (9) where δ(a, b) is 1 if a = b and 0 otherwise.When we assume |G| = | Ĝ| = 3 and that there is a one-to-one correspondence between ĝ and g, we can consider 3! = 6 mappings.In this phase, we trained the f N using Eq. 5 for all possible M s with each mapping.
In the third phase, we evaluate all the M s with Eq. 6.The final training result is from the f N trained with the best mapping, which minimizes Eq. 6.The training of the M is simplified as a problem of choosing the best mapping m.

B. MODELS
In this paper, the f N is a Transformer-Encoder model [34] to handle time series data.The model can be trained with a gradient method on the basis of Eq. 5. We inserted a [CLS] token [35] at the beginning of the input and transformed the output as a probability distribution of u with multi-layer perceptron and the softmax function.
The Encoder of the M (Eq.7) is implemented with a model similar to the f N , but it receives a sequence of both s and u.The output of the Encoder expresses a probability distribution of ĝ behind instructions.The decoder (Eqn.8) is a multilayer perceptron.

V. EVALUATION A. OVERVIEW
Two experiments were conducted to evaluate Manifestor, i.e., a numerical experiment for confirming the basis of Manifestor, and a user study experiment investigating whether explanation generated with Manifestor can contribute to improving the predictability of an IA's future motion for people.
We compared Manifestor with optimal and ablation to investigate its performance against a communication gap.optimal is trained with instructions on the basis of the g agent and L f N .It does not need to take into account a communication gap, thus should provide optimal results.Manifestor is trained with instructions on the basis of the g human , the same as with ablation, which falsely ignores a communication gap using L f N .optimal and ablation simulate the results made by the contemporary instruction-following methods [27], [28].
They demonstrate what occurs when we introduce current methods in situations with or without a communication gap.

B. NUMERICAL EXPERIMENT 1) AIMS
The numerical experiment was conducted to validate Manifestor by investigating the following two questions: i) Can we choose the best mapping m * for the M with L M ?ii) Do the training results acquired on the basis of L + f N in a situation with a communication gap match the optimal vocabulary acquired with L f N in a situation without a communication gap?These questions are for validating our idea expressed with L M and L f N , respectively.

2) PROCEDURE
An IA policy was trained with Advantage Actor-Critic (A2C), one of the most representative algorithms for DRL.From this policy, we created datasets for training and evaluating Manifestor.The datasets were composed of tuples of an s, an instruction based on a g human , and an instruction based on a g agent .A g human is randomly chosen for each episode.We prepared two datasets, unskilled and skilled datasets, on the basis of a policy trained for 500,000 and 150 million time steps, respectively, to supplementally investigate the effects of IA-policy performance on vocabulary learning.
We set N = 100 (five seconds), which we determined for the following user study experiment considering the balance between the difficulty of predicting where a rocket lands and the time left for letting people understand the context for prediction on the basis of our pilot experiment.A dataset has 3,200 episodes, and we used half for training and the other half for evaluation.
For question i), we executed the procedure shown in Subsection IV-A with 100 different random seeds.Training with each random seed produces six M s and f N s, and there is the mapping m * with which the corresponding M most accurately predicts the ground truth of g human .We calculated the ratio of  A ratio of more than 1 means that the loss function can select the best mapping.
For question ii), we calculated the accuracy of Manifestor, where accuracy means how much the outputs of Manifestor's f N match those of optimal.We used the f N with the ground truth mapping to focus on L + f N and remove the effects of L M .For comparison, we calculated the accuracy of ablation.

3) RESULTS
Figure 3 shows the results for question i).Similar results, except for the breadth of the distributions, were obtained with both unskilled and skilled datasets.From 500 samples, we could successfully distinguish the best mapping in 485 and 487 samples (97.0 and 97.4 %).The mean ratios were 1.22 (95% CI 2 1.01, 1.42) and 1.14 (95% CI 0.99, 1.29), respectively.
Figure 4 illustrates the accuracy of Manifestor and ablation.With both datasets, Manifestor was significantly more accurate than ablation (Mann-Whitney's U test).The median accuracy values of Manifestor were .700and .870,whereas those of ablation were .303and .522with the unskilled and 2 Confidence interval.skilled datasets, respectively.A possible reason is that the unskilled dataset has very few successful and many relatively better examples for human instructions, so it was difficult to clearly ground vocabulary to motions.
The numerical experiment results supported both questions, so we conclude that the two loss functions for Manifestor can effectively handle the communication-gap problem.

C. USER STUDY EXPERIMENT 1) AIMS
The user study experiment was conducted to evaluate Manifestor in more practical situations.We investigated whether future-motion explanation generated with Manifestor can improve the predictability of IA behavior for people.

2) PROCEDURE
Participants were asked to predict where a rocket would land.Figure 5 illustrates the interface shown to the participants.It showed the rocket's movement until five seconds (100 frames) before it landed along with explanation of its future motion generated with Manifestor or the other methods which we used in subsection V-B (optimal and ablation).We also prepared a baseline condition in which only the rocket movement and no explanation was provided.The explanation was shown as a bar graph.
We recruited 100 participants (26 female and 74 male; aged 21-67, M = 41.3,SD = 8.6) with compensation of 120 JPY from Lancers, 3 a crowdsourcing platform in Japan.The experiment was conducted on a web site.The participants were first provided pertinent information, and all the participants consented to the participation.Before the main task, we asked four simple questions to test the participants' comprehension about the task and removed 14 participants for evaluation.The experiment was conducted in a between-participant design, and 18, 20, 22, 26 participants were assigned to Manifestor, optimal, ablation, and baseline condition, respectively.In the main task, the participants were requested to answer where the rocket landed by indicating the index written on the moon ground (Fig. 5).Twenty episodes were shown in random order.

3) HYPOTHESES
We made two hypotheses to validate whether Manifestor can effectively explain an IA's future motion by managing the communication-gap problem: (H1) Manifestor improves predictability as much as optimal.

4) RESULTS
Figure 6 illustrates the absolute errors of the participants' predictions with statistical results.One tick error in Fig. 5 equals 0.2.The Kruskal-Wallis test revealed significant 3 https://www.lancers.jp/differences among the three methods and the baseline condition (p < .001).For a post-hoc analysis, we applied the Mann-Whitney's U test with Bonferroni correction to the results.We found significant differences among all combinations except for that between Manifestor and optimal.
Both Manifestor and optimal reduced the error compared with the baseline condition.The mean absolute errors were 0.172, 0.176, and 0.260 for Manifestor, optimal, and baseline condition, respectively.The effect size r was .25 between Manifestor and baseline condition and .22 between optimal and baseline condition.We found no significant difference between Manifestor and optimal.The r between the two was .06.These results support H1, meaning that even though a communication gap exists, Manifestor can enable an IA to learn vocabulary and generate future-motion explanations that improve the predictability of an IA motion as much as vocabulary learned in ideal situations in which a communication gap does not exist.
ablation failed to improve predictability of an IA motion and rather misled participants.The mean absolute error of ablation was 0.346.The r was .16 between baseline condition and ablation, and .37 between Manifestor and ablation.These results support H2, which confirms that the communication-gap problem needs to be solved in our settings to properly learn vocabulary from people.

VI. LIMITATIONS AND FUTURE WORK
We empirically demonstrated that Manifestor can enable an IA to properly learn vocabulary in situations with communication gaps and contribute to improving the predictability of IA motion with the learned vocabulary.However, the implementation of Manifestor and experimental settings mainly focused on validating our idea formalized as loss functions (Eq.4-6), and further consideration is required to apply them to actual human-IA interaction.
We defined a rule for generating human instructions (Eq.3), but a previous study on IRL revealed that human feedback signals are infrequent, inconsistent, or suboptimal [36].It would be promising to improve Manifestor on the basis of IRL models that can handle such characteristics of human instructors.
As we mentioned in Subsection II-C, we assumed that a person only gives instructions and never provides feedbacks such as whether an IA's motion followed what s/he said.However, using feedback from people and Manifestor are complementary; feedback gives a boost to acquiring an M of Manifestor while an M reduces the need of feedback.For future work, we are planning to integrate other information provided by an instructor to both accelerate the training process of Manifestor and generate IA motions.
Contextual information is also helpful for developing an M .A g human is randomly chosen for each episode in this paper, but the g human can depend on context, and if so, context can be a hint to infer it.In particular, the behavior of a person who gives instructions provides plenty of information about her/his goal.An IA motion can also affect the g human because a person sometimes tries to infer an artificial agent's mental states on the basis of mere observations of IA behavior [37], [38].When a person tries to infer a g agent to provide instructions, an IA may need to infer how its own motion is considered by people.
Another direction for improving Manifestor is to refer to semantics to interpret human instructions.Manifestor learns vocabulary without prior knowledge about what it means.However, an IA can more efficiently and properly interpret vocabulary by using a lexicon or language model trained with corpus data [39], [40].Combining such information with Manifestor is promising for reducing the human cost of interacting with an IA for vocabulary learning.
Our implementation of Manifestor has an assumption of g human ∈ G, that is, the g human is derived from a set of an IA's possible goals; thus, the IA's evaluator can evaluate whether a motion matches the g human .However, a non-expert may ask an IA to work on a task that is overlooked in design or beyond the capabilities of the IA.Applying inverse RL methods [41] to a person may be a promising means to specify the g human = G and build an evaluator that evaluate an IA motion on the basis of the inferred reward function.
Manifestor relies on the predictor, but predicting the transition of the environment is still an important domain of research.It is challenging particularly for the real world because it tends to be nondeterministic and highly complex.An actively researched domain is video prediction [42]- [44].However, an IA needs to handle action-conditional prediction because its actions affect the environment [45].Model-based RL that attempts to integrate a world dynamics model into an IA's decision-making can provide a direction for implementing an action-conditional prediction model for a more complex environment [46].Prediction accuracy depends on N , or the length of prediction.We need to further investigate how long Manifestor structurally affords to generate future-motion explanation.

VII. CONCLUSION
We proposed Manifestor, a method for explaining an IA's future motion on the basis of vocabulary learning.Manifestor enables IAs to learn vocabulary that expresses their motions from a person's instructions of how they should act.By inferring their goals behind instructions, Manifestor can manage the communication-gap problem in which a person and an IA do not share their goals.The numerical and user study experiments demonstrated that Manifestor can generate future-motion explanation of an IA with learned vocabulary and improve the predictability of IA behavior even if communication gaps exist.

D. TRAINING MANIFESTOR
The Encoder (Eq.7) is based on the Transformer-Encoder implementation from the PyTorch library.Both s t and u t are embedded into 64-dimensional vectors and concatenated before entering the Encoder.A sequence of (s, u) with a [CLS] token at the beginning is input to the Transformer-Encoder model, which outputs vectors for each sequence element.The output vector for [CLS] is transformed to three-dimensional vector with a perceptron and the softmax function.This is the output of Encoder and expresses the probability of ĝ ∈ Ĝ.Table 3 lists the hyperparameters for the Encoder and Decoder.
The f N is implemented with a similar model to the Encoder.The differences are that the input is only s t,N and that the output vector is considered the probability that s t,N is expressed with vocabulary u ∈ U.

APPENDIX B STATISTICS DETAILS
See Tables 5-8.

FIGURE 1 .
FIGURE 1. Manifestor enables IA to learn vocabulary used in person's instructions and apply it to explain IA's future motion.

FIGURE 3 .
FIGURE 3. Histograms of ratio distributions.Blue lines show density of ratio greater than 1, and red lines elsewise.

FIGURE 5 .
FIGURE 5. Interface for user study experiment.

FIGURE 6 .
FIGURE 6. Absolute errors of participants' predictions.See Appendix B for statistical details.

15 : 23 : 24 :
f N ,Manifestor,m ← Train f N with (L + f N , M m , s, a, u) L M m /L M m * for each m( = m * ) optimal,m * ← Train f N with (L f N , M m * , s, a, u ) 22: f N ,ablation,m * ← Train f N with (L f N , M m * ,s, a, u) Calculate how much the outputs of f N ,Manifestor,m * match those of f N ,optimal,m * Calculate how much the outputs of f N ,ablation,m * match those of f N ,optimal,m * 25: end for C. TRAINING A2C An A2C agent was trained for the modified LunarLander-v2 (Subsection II-D) using pfrl, a DRL library [47].Table 2 lists the hyperparameters for the training.

Table 1
lists the softwares used in the experiments.

TABLE 4 .
Hyperparamters of f N .
Table 4 lists the hyperparameters for the f N .

TABLE 8 .
User study experiment -Mann-Whitney's U test with Bonferroni correction.