Data-intensive processing in embedded systems is receiving much attention in multimedia computing and high-speed telecommunications. The memory bandwidth problem of traditional von Neumann architectures, however, is impairing processor efficiency. On the other hand, ASIC designs suffer from skyrocketing manufacturing costs and long development cycles. This results in an increasing need for postfabrication programmability at both software and hardware levels. FPGAs provide maximum flexibility with their fine-grained architecture but bring severe overhead in timing, area, and power consumption. Wordor subword-oriented runtime reconfigurable architectures offer highly parallel, scalable solutions combining hardware performance with software flexibility.1 Their coarser granularity reduces area, delay, power consumption, and reconfiguration time, but they introduce trade-offs in processing-element design.