Video-based surveillance and monitoring of indoor spaces such as offices, airports and convenience stores has attracted increasing interest in recent years. While video proves useful for inferring information pertaining to identities and activities, it results in large data overheads. On the other hand, motion sensors are much more data-efficient and far less expensive, but possess limited recognition capabilities. In this paper, we describe a system that integrates a large number of wireless motion sensors and a few strategically placed cameras and its application to real-time monitoring of indoor spaces. The system described here responds to an event immediately as it happens and provides visual evidence of the location of the event, thereby establishing an awareness of the events in the entire location being monitored, supplying the user with the information about “when,” “where,” and “what” happens in the space as the events unfold. We introduce a system that is designed for maximizing the utility of the video data recorded from a location. It achieves this goal by following the minimal commitment strategy, where no data is discarded and no particular hypothesis is pursued until the time when the interpretation is necessary. Additionally, we employ an alternative modality to help in indexing video data for real time, as well as for possible future use in forensic mode. We use the motion sensor data to specify policies of camera control. Utilizing these policies makes the application of machine learning and computer vision techniques simple to use to perform on-line surveillance tasks in a fast, accurate, and scalable way.