This paper presents a high parallel macro block level layered LDPC decoder architecture for the quasi-cyclic low-density parity-check (QC-LDPC) codes with various code rates and code lengths. LDPC codes defined in WiMAX standard with 6 code rates and 19 code lengths are chosen as the demonstration of this architecture. Based on the proposed dedicated matrix reordering strategy, this decoder costs 12-24 clock cycles per iteration for different code rates. Compared with the state-of-art work, this decoder reduces total memory bits to a great extent and achieves 2x-4.3x higher parallelism with 1.2x hardware cost. The synthesis result proves the low power potential of this architecture.