This paper proposes a novel flexible VLSI architecture for the implementation of variable block size motion estimation (VBSME). The architecture is able to perform a full motion search on integral multiples of 4×4 blocks sizes. To use the architecture, each 16×16 macroblock of the source frames should be partitioned into sixteen 4×4 non-overlapping subblocks, called primitive subblocks. The architecture contains sixteen modules and one VBSME processor. Each module, realized by cascading ID systolic arrays, is responsible for the block-matching operations of a different primitive subblock. The realization has the advantages of high throughput, high flexibility and 100 % processing element (PE) utilization. The motion estimation of all the primitive subblocks is performed in parallel. Because these primitive subblocks can be used to form the 41 subblocks of different sizes specified by the H.264, the VBSME processor is employed to concurrently compute the sums of absolute differences (SADs) of all the 41 subblocks from the SADs of the primitive subblocks. This new architecture has lower latency and higher throughput over other exiting VBSME architectures for the hardware implementation of H.264 encoders.