This paper proposes a parallel architecture for a fast three step search (FTSS) algorithm, which is used in motion estimation. FTSS algorithm involves reduced number of search points and is thus less computationally expensive compared to the standard three step search (TSS) algorithm. Degradation of performance while applying the FTSS algorithm to several standard images has been shown to be insignificant compared to the standard TSS algorithm. The proposed architecture uses only three processing elements accompanied with use of intelligent data arrangement and memory configuration. A technique for reducing external memory accesses has also been developed. The proposed architecture for FTSS provides an efficient solution for applications requiring real-time motion estimations, because it requires smaller area and power than what would be required to implement TSS. The proposed architecture provides the solution for low bit-rate video applications like video telephony and teleconferencing.