Skip to Main Content
The design and evaluation of microprocessor architectures is a difficult and time-consuming task. Although small, hand-coded microbenchmarks can be used to accelerate performance evaluation, these programs lack the complexity to stress increasingly complex architecture designs. Larger and more complex real-world workloads should be employed to measure the performance of a given design or to evaluate the efficiency of various design alternatives. These applications can take days or weeks if run to completion on a detailed architecture simulator. In the past, researchers have applied machine learning and statistical sampling methods to reduce the average number of instructions required for detailed simulation. Others have proposed statistical simulation and workload synthesis techniques, which can produce programs that emulate the execution characteristics of the application from which they are derived but have a much shorter execution period than the original. However, these existing methods are difficult to apply to multi-threaded programs and can result in simplifications that miss the complex interactions between multiple, concurrently running threads. This study focuses on developing new techniques for accurate and effective multi-threaded workload synthesis, which can significantly accelerate architecture design evaluation of multi-core processors. We propose to construct synchronized statistical flow graphs that incorporate inter-thread synchronization and sharing behavior to capture the complex characteristics and interactions of multiple threads. Moreover, we develop thread-aware data reference models and wavelet-based branching models to generate accurate memory access and dynamic branch statistics. Experimental results show that a framework integrated with the aforementioned models can automatically generate synthetic programs that maintain characteristics of original workloads but have significantly reduced runtime.