In this paper, the problem of human detection in crowded scenes is formulated as a maximum a posteriori problem, in which, given a set of candidates, predefined 3-D human shape models are matched with image evidence, provided by foreground extraction and probability of boundary, to estimate the human configuration. The optimal solution is obtained by decomposing the mutually related candidates into unoccluded and occluded ones in each iteration according to a graph description of the candidate relations and then only matching models for the unoccluded candidates. A candidate validation and rejection process based on minimum description length and local occlusion reasoning is carried out after each iteration of model matching. The advantage of the proposed optimization procedure is that its computational cost is much smaller than that of global optimization methods, while its performance is comparable to them. The proposed method achieves a detection rate of about 2% higher on a subset of images of the Caviar data set than the best result reported by previous works. We also demonstrate the performance of the proposed method using another challenging data set.