Application-specific extensions to the computational capabilities of a processor provide an efficient mechanism to meet the growing performance and power demands of embedded applications. Hardware, in the form of new function units (or coprocessors), and the corresponding instructions are added to a baseline processor to meet the critical computational demands of a target application. In this paper, the design of a system to automate the instruction set customization process is presented. A dataflow graph design space exploration engine efficiently identifies computation subgraphs to create custom hardware and a compiler subgraph matching framework seamlessly exploits this hardware. We demonstrate the effectiveness of this system across a range of application domains and study the applicability of the custom hardware across an entire application domain. Generalization techniques are presented which enable the application-specific hardware to be more effectively used across a domain.