In this letter, we present the architecture and implementation of a novel, 3-stage processing engine, suitable for deep packet processing in high-speed networks. The engine, which has been fabricated as part of a network processor, comprises of a typical RISC core and programmable hardware. To assess the performance of the engine, experiments with packets of various lengths have been performed and compared against the IXP1200 network processor. The comparison has revealed that for the case study shown in this letter, the proposed packet-processing engine is up to three times faster. Moreover, the engine is simple to be fabricated, less expensive than the corresponding hardware cores of IXP1200 and can be easily programmed for different networking applications.