Implementation of an iterative QR decomposition (QRD) (IQRD) architecture based on the modified Gram-Schmidt (MGS) algorithm is proposed in this paper. A QRD is extensively adopted by the detection of multiple-input-multiple-output systems. In order to achieve computational efficiency with robust numerical stability, a triangular systolic array (TSA) for QRD of large-size matrices is presented. In addition, the TSA architecture can be modified into an iterative architecture that is called IQRD for reducing hardware cost. The IQRD hardware is constructed by the diagonal and the triangular process with fewer gate counts and lower power consumption than TSAQRD. For a 4 × 4 matrix, the hardware area of the proposed IQRD can reduce about 41% of the gate counts in TSAQRD. For a generic square matrix of order m IQRD, the latency required is 2m - 1 time units, which is based on the MGS algorithm. Thus, the total clock latency is only 10 m - 5 cycles.