Skip to Main Content
This paper presents a baseline residual encoder for H.264/AVC on a programmable fine-grained many-core processing array that utilizes no application-specific hardware. The software encoder contains integer transform, quantization, and context-based adaptive variable length coding functions. By exploiting fine-grained data and task-level parallelism, the residual encoder is partitioned and mapped to an array of 25 small processors. The proposed encoder encodes video sequences with variable frame sizes and can encode 1080p high-definition television at 30 f/s with 293 mW average power consumption by adjusting each processor to workload-based optimal clock frequencies and dual supply voltages-a 38.4% power reduction compared to operation with only one clock frequency and supply voltage. In comparison to published implementations on the TI C642 digital signal processing platform, the design has approximately 2.9-3.7 times higher scaled throughput, 11.2-15.0 times higher throughput per chip area, and 4.5-5.8 times lower energy per pixel. Compared to a heterogeneous single instruction, multiple data architecture customized for H.264, the presented design has 2.8-3.6 times greater throughput, 4.5-5.9 times higher area efficiency, and similar energy efficiency. The proposed fine-grained parallelization methodology provides a new approach to program a large number of simple processors allowing for a higher level of parallelization and energy-efficiency for video encoding than conventional processors while avoiding the cost and design time of implementing an application specific integrated circuit or other application-specific hardware.