Presents a task-clustering algorithm named GTCS (Greedy Task Clustering and Scheduling). The algorithm represents the tasks of a parallel program as nodes and schedules them with a directed acyclic task graph (DAG). The parallel hardware architecture focused on in this paper is the distributed memory architecture. GTCS implements the task-clustering method by partitioning the nodes on a DAG. Each group or cluster within the DAG is placed on only one processor. The algorithm partitions the DAG into several clusters to reduce the communication overhead between the processors, because the network has a lot of communication overhead, which increases the total execution time or makespan of the tasks, especially in parallel programs that perform fine-grained tasks on a network. The GTCS algorithm produces a schedule whose makespan is at most twice that of the optimal cluster. GTCS has a time complexity equal to O[m(|V|/spl middot/lg|V|+|E|)] where m is the number of clusters and 1/spl les/m/spl les/V.