Multi-camera people localization in crowd scene is an active research topic in recent years. The localization methods based on multi-view geometry are vulnerable to the foreground segmentation errors caused by occlusion or unfavorable observing conditions. Alternatively, the problem can be solved by inference on a MRF or CRF model defined on the discretized ground plane. The main difficulty in this kind of methods is the combinatorial explosion of occupancy state space and the complex structure of the underlying graphical model. In this paper, we formulate the localization problem as a MAP problem on higher order MRF with complicated dependency structure. Instead of optimizing on the original model directly, we solve the problem by cascaded optimization on a sequence of MRFs with increasing clique size. The optimization problem in each stage, i.e. in each MRF with specific clique size, is solved using the pattern based optimization algorithm. And the optimizer of the lower order MRF is used to impose constraint on the state space of the higher order MRF in the next level, in order to maintain the computation to be tractable. The proposed method is verified on public data sets and shows superior performance compared with state of the art multi-camera people detection algorithm .