Hardware-software co-synthesis starts with an embedded-system specification and results in an architecture consisting of hardware and software modules to meet performance, power, and cost goals. Embedded systems are generally specified in terms of a set of acyclic task graphs. In this paper, we present a co-synthesis algorithm COSYN, which starts with periodic task graphs with real-time constraints and produces a low-cost heterogeneous distributed embedded-system architecture meeting these constraints. It supports both concurrent and sequential modes of communication and computation. It employs a combination of preemptive and nonpreemptive static scheduling. It allows task graphs in which different tasks have different deadlines. It introduces the concept of an association array to tackle the problem of multirate systems. It uses a new task-clustering technique, which takes the changing nature of the critical path in the task graph into account. It supports pipelining of task graphs and a mix of various technologies to meet embedded-system constraints and minimize power dissipation. In general, embedded-system tasks are reused across multiple functions. COSYN uses the concept of architectural hints and reuse to exploit this fact. Finally, if desired, it also optimizes the architecture for power consumption. COSYN produces optimal results for the examples from the literature while providing several orders of magnitude advantage in central processing unit time over an existing optimal algorithm. The efficacy of COSYN and its low-power extension COSYN-LP is also established through their application to very large task graphs (with over 1000 tasks).