Skip to Main Content
A multi-core co-processor for mobile application processors is introduced. It provides low-power, high-throughput, fully software-based acceleration of multimedia processing. The test chip fabricated in a 65 nm CMOS technology consumes 620 mW in H.264 720p 60 fps decoding and 9.7 mW in MPEG-4 AAC decoding. In the maximum workload of H.264 decoding, a symmetrical parallelization achieves 7.5times performance enhancement by 8 cores. The shared L2 cache reduces the required rate of main memory access to 310 MB/s. In the minimum workload of AAC decoding, three low-power circuit techniques reduce 98% of leakage. On-chip regulators, which also work as power-gating switches, lower the supply voltage of processing cores. Embedded forward body-biasing circuit reduces Vt variations. A low-power and fast data-mapping F/F relaxes the timing constraint, which enables a reduction in the number of low-Vt transistors.