For a grammar-based approach to the recognition of visual events, there are two major limitations that prevent it from real application. One is that the event rules are predefined by domain experts, which means huge manual cost. The other is that the commonly used grammar can only handle sequential relations between subevents, which is inadequate to recognize more complex events involving parallel subevents. To solve these problems, we propose an extended grammar approach to modeling and recognizing complex visual events. First, motion trajectories as original features are transformed into a set of basic motion patterns of a single moving object, namely, primitives (terminals) in the grammar system. Then, a Minimum Description Length (MDL) based rule induction algorithm is performed to discover the hidden temporal structures in primitive stream, where Stochastic Context-Free Grammar (SCFG) is extended by Allen's temporal logic to model the complex temporal relations between subevents. Finally, a Multithread Parsing (MTP) algorithm is adopted to recognize interesting complex events in a given primitive stream, where a Viterbi-like error recovery strategy is also proposed to handle large-scale errors, e.g., insertion and deletion errors. Extensive experiments, including gymnastic exercises, traffic light events, and multi-agent interactions, have been executed to validate the effectiveness of the proposed approach.