Skip to Main Content
Future video decoders will need to support high resolutions such as Quad Full HD (QFHD, 4096 × 2160) and fast frame rates (e.g., 120 fps). Many of these decoders will also reside in portable devices. Parallel processing can be used to increase the throughput for higher performance (i.e., processing speed), which can be traded-off for lower power with voltage scaling. The next generation standard called High Efficiency Video Coding (HEVC), which is being developed as a successor to H.264/AVC, not only seeks to improve the coding efficiency but also to account for implementation complexity and leverage parallelism to meet future power and performance demands. This paper presents a silicon prototype for a pre-standard algorithm developed for HEVC (“H.265”) called Massively Parallel CABAC (MP-CABAC) that addresses a key video decoder bottleneck. A scalable test chip is implemented in 65-nm and achieves a throughput of 24.11 bins/cycle, which enables it to decode the max H.264/AVC bit-rate (300 Mb/s) with only a 18 MHz clock at 0.7 V, while consuming 12.3 pJ/bin. At 1.0 V, it decodes a peak of 3026 Mbins/s for a bit-rate of 2.3 Gb/s, enough for QFHD at 186 fps. Both architecture and joint algorithm-architecture optimizations used to reduce critical path delay, area cost and memory size are discussed.