This paper presents a systematic approach to the design of application-specific instruction-set processors for high speed computation of local neighborhood functions and intra-field deinterlacing. The intended application is real-time processing of high definition video. The approach aims at an efficient utilization of the available memory bandwidth by fully exploiting the data parallelism inherent to the target algorithm class. An appropriate choice of custom instructions and application-specific registers is used together with a very long instruction word architecture in order to mimic a pipelined systolic array. This leads to a processing speed close to the limit imposed by memory bandwidth constraints. For three intra-field deinterlacing algorithms and 2-D convolution with four kernel sizes, the design approach yields speedup factors between 36 and 1330, Area-Time (AT) product improvements between 12× and 243×, and energy consumption reduction factors between 13 and 262.