The last years have shown that there is no way to disregard the advantages provided by multiprocessor System-on-Chip (MPSoC) architectures in the embedded systems domain. Using multiple cores in a single system enables to close the gap between energy consumption, problems concerning heat dissipation, and computational power. Nevertheless, these benefits do not come for free. New challenges arise, if existing applications have to be ported to these multiprocessor platforms. One of the most ambitious tasks is to extract efficient parallelism from these existing sequential applications. Hence, many parallelization tools have been developed, most of them are extracting as much parallelism as possible, which is in general not the best choice for embedded systems with their limitations in hardware and software support. In contrast to previous approaches, we present a new automatic parallelization tool, tailored to the particular requirements of the resource constrained embedded systems. Therefore, this paper presents an algorithm which automatically steers the granularity of the generated tasks, with respect to architectural requirements and the overall execution time reduction. For this purpose, we exploit hierarchical task graphs to simplify a new integer linear programming based approach in order to split up sequential programs in an efficient way. Results on real-life benchmarks have shown that the presented approach is able to speed sequential applications up by a factor of up to 3.7 on a four core MPSoC architecture.