A general strategy for automatically decomposing and dynamically distributing a functional program is discussed. The strategy is suitable for parallel execution on multiprocessor architectures with no shared memory. It borrows ideas from data flow and reduction machine research on the one hand, and from conventional compiler technology for sequential machines on the other. One of the more troublesome issues in such a system is choosing the right granularity for the parallel tasks. As a solution, the authors describe a program transformation technique based on serial combinators that offers in some sense just the right granularity for this style of computing, and that can be fine-tuned for particular multiprocessor architectures. Simulation demonstrates the success of this approach.