In this work a co-design flow for processor centric embedded systems with hardware acceleration using FPGAs is proposed. This flow helps to reduce design effort by raising abstraction level while not imposing the need for engineers to learn new languages and tools. The whole system is designed using well established high level modeling techniques, languages and tools from the software domain. That is, an OOP design approach expressed in UML and implemented in C++. Software coding effort is reduced since the C++ implementation not only provides a golden reference model, but may also be used as part of the final embedded software. Hardware coding effort is also reduced. The modular OOP design facilitates the engineer to find the exact methods that need to be accelerated by hardware using profiling tools, preventing useless translations to hardware. Moreover, the two-process structured VHDL design method used for hardware implementation has proven to reduce man-years, code lines and bugs in many major developments. A real-time image processing application for multiple robot localization is presented as a case study. The overall time improvement from the original software solution to the final hardware accelerated solution is 9.7×, with only 4% increase in area (143 extra slices). The embedded solution achieved following the proposed methodology runs 17% faster than in a standard PC, and it is a much smaller, cheaper and less power-consuming solution.