Loading [MathJax]/extensions/MathMenu.js
Knowledge is Never Enough: Towards Web Aided Deep Open World Recognition | IEEE Conference Publication | IEEE Xplore

Knowledge is Never Enough: Towards Web Aided Deep Open World Recognition


Abstract:

While today's robots are able to perform sophisticated tasks, they can only act on objects they have been trained to recognize. This is a severe limitation: any robot wil...Show More

Abstract:

While today's robots are able to perform sophisticated tasks, they can only act on objects they have been trained to recognize. This is a severe limitation: any robot will inevitably see new objects in unconstrained settings, and thus will always have visual knowledge gaps. However, standard visual modules are usually built on a limited set of classes and are based on the strong prior that an object must belong to one of those classes. Identifying whether an instance does not belong to the set of known categories (i.e. open set recognition), only partially tackles this problem, as a truly autonomous agent should be able not only to detect what it does not know, but also to extend dynamically its knowledge about the world. We contribute to this challenge with a deep learning architecture that can dynamically update its known classes in an end-to-end fashion. The proposed deep network, based on a deep extension of a non-parametric model, detects whether a perceived object belongs to the set of categories known by the system and learns it without the need to retrain the whole system from scratch. Annotated images about the new category can be provided by an `oracle' (i.e. human supervision), or by autonomous mining of the Web. Experiments on two different databases and on a robot platform demonstrate the promise of our approach.
Date of Conference: 20-24 May 2019
Date Added to IEEE Xplore: 12 August 2019
ISBN Information:

ISSN Information:

Conference Location: Montreal, QC, Canada

I. Introduction

For robots to perform intelligent, autonomous behaviors, it is crucial that they understand what they see. The applications requiring visual abilities are countless: from self-driving cars to detecting and handling objects for service robots in homes, from kitting in industrial workshops, to robots filling shelves and shopping baskets in supermarkets, etc, they all imply interacting with a wide variety of objects, requiring in turn a deep understanding of what these objects look like, their visual properties and associated functionalities. Still, the best vision systems we have today are not yet up to the needs of artificial autonomous systems in the wild. There are examples of robots performing complex tasks such as loading a dishwasher [1] or flipping pancakes [2]. However, the visual knowledge about the objects involved in these tasks is manually encoded within the robots control programs or knowledge bases, limiting them to operate on the objects they have been programmed to understand. More in general, the current mainstream approach to visual recognition, based on convolutional neural networks [3], [4], makes the so called closed world assumption, i.e. it assumes that the number and type of objects a robot will encounter in its activities is fixed and known a priori. Hence, the big challenge is to make these visual algorithms robust to illumination, scale and categorical variations as well as clutter and occlusions.

Contact IEEE to Subscribe

References

References is not available for this document.