The paper concerns task graph scheduling in parallel programs for a parallel architecture based on dynamic SMP processor clusters with data transmissions on the fly. The assumed executive computer architecture consists of a set of NoC modules, each containing a set of processors and memory blocks connected via a local interconnection network. NoC modules are connected via a global interconnection network. An algorithm for scheduling parallel program graphs is presented, which decomposes an initial program graph into sub-graphs, which are then mapped to NoC modules, reducing global communication between modules. Then these subgraphs are structured inside the modules to include reads on the fly and processor switching. Reads on the fly reduce execution time of the program by elimination of read operations in linear program execution time.