A programmable turbo decoder is designed to support multiple third-generation wireless communication standards. We propose a hybrid architecture of hardware and software, which has small size, low power, and high performance like hardware implementations, as well as the flexibility and programmability of software. It mainly consists of a configurable hardware soft-input-soft-output (SISO) decoder and a 16-b single-instruction multiple- data processor, which is equipped with five processing elements and special instructions customized for interleaving in order to provide interleaved data at the speed of the hardware SISO. A fast and flexible software implementation of the block interleaving algorithm is also proposed. The interleaver generation is split into two parts, preprocessing and on-the-fly generation, to reduce the timing overhead of changing the interleaver structure. We present detailed descriptions of the interleaving implementation applied to the W-CDMA and cdma2000 standard turbo codes. The decoder occupies 8.90 mm2 in a 0.25-mum CMOS with five metal layers and exhibits the maximum decoding rate of 5.48 Mb/s.