Data acquisition by multidomain data acquisition provides means for environment perception usable for detecting unusual and possibly dangerous situations. When being automated, this approach can simplify surveillance tasks required in, for example, airports or other security sensitive infrastructures. This paper describes a novel architecture for surveillance networks based on combining multimodal sensor information. Compared to previous methodologies using only video information, the proposed approach also uses audio data thus increasing its ability to obtain valuable information about the sensed environment. A hierarchical processing architecture for observation and surveillance systems is proposed, which recognizes a set of predefined behaviors and learns about normal behaviors. Deviations from “normality” are reported in a way understandable even for staff without special training. The processing architecture, including the physical sensor nodes, is called smart embedded network of sensing entities (SENSE).