Single-chip multi-processor embedded system becomes nowadays a feasible and very interesting option. What is needed however is an environment that supports the designer in transforming an algorithmic specification into a suitable parallel implementation. We present and demonstrate an important component of such an environment-an efficient design space exploration algorithm. The algorithm can be used to semi-automatically find the best parallelization of a given embedded application. It employs functional pipelining and data set partitioning simultaneously with source-to-source program transformations to obtain the most advantageous hierarchical parallelizations.