Loading [a11y]/accessibility-menu.js
Exophora Resolution of Linguistic Instructions with a Demonstrative based on Real-World Multimodal Information | IEEE Conference Publication | IEEE Xplore

Exophora Resolution of Linguistic Instructions with a Demonstrative based on Real-World Multimodal Information


Abstract:

To enable a robot to provide support in a home environment through human-robot interaction, exophora resolution is crucial for accurately identifying the target of ambigu...Show More

Abstract:

To enable a robot to provide support in a home environment through human-robot interaction, exophora resolution is crucial for accurately identifying the target of ambiguous linguistic instructions, which may include a demonstrative, such as “Take that one”. Unlike endophora resolution, which involves predicting the corresponding word from given sentences, exophora resolution necessitates comprehensive utilization of external real-world information to identify and disambiguate the target from the on-site environment. This study aims to resolve ambiguity in language instructions containing a demonstrative through exophora resolution, utilizing real-world multimodal information. The robot accomplishes this by using three types of information: 1) object categories, 2) demonstratives, and 3) pointing, as well as knowledge about objects obtained from the robot’s pre-exploration of the environment. We evaluated the accuracy of object identification under multiple conditions by identifying a user-indicated object in a field that mimics a home environment. Our results demonstrate that our proposed method of exophora resolution using multimodal information can identify the target with two to three times higher accuracy than baseline methods in cases where information is missing.
Date of Conference: 28-31 August 2023
Date Added to IEEE Xplore: 13 November 2023
ISBN Information:

ISSN Information:

Conference Location: Busan, Korea, Republic of

Funding Agency:

References is not available for this document.

I. Introduction

Comprehending a user’s ambiguous verbal instructions, such as “Take that one and conducting specific tasks based on the information obtained from the surroundings, is one of the most significant challenges in the application of human support robots [1]. The term “that” in this context pertains to an object that was not specified in the verbal directive, i.e., a demonstrative. To understand the referent of a demonstrative, it is essential to perform anaphora resolution. In the past, research on anaphora resolution [2], [3] in the domain of natural language processing has predominantly focused on linguistic information as context. Anaphora resolution entails the identification of the object to which a pronoun or a demonstrative refers, and there are two types: “endophora resolution” and “exophora resolution.” Endophora resolution entails determining the appropriate word corresponding to a linguistic expression within a context, considering the surrounding text both preceding and following the target expression. In contrast, exophora resolution entails predicting the object that corresponds to the target expression in the on-site environment. Endophora resolution may be used to solve problems where the answer or clue is present in the text. However, it is essential to perform exophora resolution, which relies on non-verbal information, to comprehend verbal instructions containing a demonstrative such as “Get that one” [4]–[6].

References is not available for this document.

Contact IEEE to Subscribe

References

References is not available for this document.