Skip to Main Content
Generalizing objects in an action-context by a robot, for example addressing the problem: "Which items can be cut with which tools?", is an unresolved and difficult problem. Answering such a question defines a complete action class and robots cannot do this so far. We use a bootstrapping mechanism similar to that known from human language acquisition, and combine languagewith image-analysis to create action classes built around the verb (action) in an utterance. A human teaches the robot a certain sentence, for example: "Cut a sausage with a knife", from where on the machine generalizes the arguments (nouns) that the verb takes and searches for possible alternative nouns. Then, by ways of an internet-based image search and a classification algorithm, image classes for the alternative nouns are extracted, by which a large "picture book" of the possible objects involved in an action is created. This concludes the generalization step. Using the same classifier, the machine can now also perform a recognition procedure. Without having seen the objects before, it can analyze a visual scene, discovering, for example, a cucumber and a mandolin, which match to the earlier found nouns allowing it to suggest actions like: "I could cut a cucumber with a mandolin". The algorithm for generalizing objects by analyzing/anguage (GOAL) presented here, allows, thus, generalization and recognition of objects in an action-context. It can then be combined with methods for action execution (e.g. action generation-based on human demonstration) to execute so far unknown actions.