Skip to Main Content
Systems-on-chip often use hardware accelerators or coprocessors to provide efficient implementations of application-specific functions. The emergence of extensible processor cores with supporting design tools has given designers with another viable alternative, namely, the use of application-specific custom instructions. Coprocessors and custom instructions can be viewed as two different forms of hardware acceleration that are applicable at different levels of granularity and offer differing tradeoffs. Classical hardware/software-partitioning techniques and application-specific instruction-set design tools address the individual problems of coprocessor generation and custom-instruction addition. However, given a complex application, it is not clear which design choice (coprocessors or custom instructions or a combination) will result in better performance, area, or power consumption. We demonstrate that a combination of custom instructions and coprocessors is often the best solution in many applications, making the case for a hybrid custom-instruction and coprocessor-synthesis methodology. We propose such a methodology that builds upon the basic observations that coprocessors are usually good for coarse-grained tasks and require minimal intervention or support from the processor, while custom instructions are usually suited to fine-grained operations that are best integrated into a processor pipeline. Our methodology uses a hierarchical task-graph representation in order to support both coarse-and fine-grained views of an application, which are necessary to make meaningful tradeoffs. We propose a hierarchical synthesis algorithm that incorporates multiobjective evolutionary optimization in order to handle different design dimensions, such as area and performance, and provide a wide range of nondominated solutions. We have implemented the proposed methodology in the context of a commercial extensible processor-based platform (Xtensa from Tensilica). Our design flow u- ses a commercial behavioral-synthesis tool and an existing automatic-custom-instruction-generation tool. Our experiments with several applications show that simultaneous custom-instruction and coprocessor synthesis can achieve significantly better area/performance tradeoffs than using only one of them.