Skip to Main Content
Automatically understanding human actions using motion trajectories derived from video sequences is a very challenging problem. Since an action takes place in 3-D, and is projected on 2-D image, depending on the viewpoint of the camera, the projected 2-D trajectory may vary. Therefore, the same action may have very different trajectories, and trajectories of different actions may look the same. This may create a problem in interpretation of trajectories at the higher level. However, if the representation of actions only captures characteristics, which are view-invariant, then the higher level interpretation can proceed without any ambiguity. In most of the current work on action recognition, the issue of view invariance has been ignored. Therefore, proposed methods do not succeed in more general situations. In this paper, we first present a view-invariant representation of action consisting of dynamic instants and intervals, which is computed using spatiotemporal curvature of a trajectory. Then this representation is used by our system to learn human actions without any training. The system is able to incrementally learn different actions starting with no model. It can discover instances of the same action performed by different people, and in different viewpoints.