In the last decades, many efforts have been devoted to develop methods for automatic scene understanding in the context of video surveillance applications. This paper presents a novel nonobject centric approach for complex scene analysis. Similarly to previous methods, we use low-level cues to individuate atomic activities and create clip histograms. Differently from recent works, the task of discovering high-level activity patterns is formulated as a convex prototype learning problem. This problem results in a simple linear program that can be solved efficiently with standard solvers. The main advantage of our approach is that, using as the objective function the Earth Mover's Distance (EMD), the similarity among elementary activities is taken into account in the learning phase. To improve scalability we also consider some variants of EMD adopting L1 as ground distance for 1D and 2D, linear and circular histograms. In these cases, only the similarity between neighboring atomic activities, corresponding to adjacent histogram bins, is taken into account. Therefore, we also propose an automatic strategy for sorting atomic activities. Experimental results on publicly available datasets show that our method compares favorably with state-of-the-art approaches, often outperforming them.