Skip to Main Content
Humans, as well as many living organisms, are gifted with the power of “seeing” and “understanding” the environment around them using their eyes. The ease with which humans process and understand the visual world is very deceiving and often prompts us to underestimate the effort and methods needed to build practical, effective, and inexpensive computer vision systems. In essence, humans have a 500-million-year head start due to evolution; it is extremely difficult at this point to build a computer vision system that has the abilities of a three-year-old child. However, by confining ourselves to particular domains, we can often find shortcuts to solve particular problems. This paper illustrates a number of such solutions in various areas developed by our group at IBM. These include object finding for video surveillance, person identification via biometrics, inspection of manufactured items along railways, and scene understanding for driver assistance, as well as object recognition and motion interpretation for retail stores. We discuss the real-world constraints for each system and describe how we overcame the irksome variability inherent in each task. By further analyzing such successful systems and comparing them to each other, we can come to understand the common underlying problems and thus start to extend our initially limited areas of competence into a more general-purpose vision toolkit. This paper concludes with a set of challenging unresolved problems that if solved could spur great progress in practical computer vision.
Note: The Institute of Electrical and Electronics Engineers, Incorporated is distributing this Article with permission of the International Business Machines Corporation (IBM) who is the exclusive owner. The recipient of this Article may not assign, sublicense, lease, rent or otherwise transfer, reproduce, prepare derivative works, publicly display or perform, or distribute the Article.