Skip to Main Content
Effectively utilizing the compute power of modern multi-core machines is a challenging task for a programmer. Automated extraction of shared memory parallelism via powerful compiler transformations and optimizations is one means to such a goal. However, the effectiveness of such transformations is tied to detailed characteristics of the target computer system. In this paper, we describe an automated system for capturing such computer system characteristics that is based on prior work on various parts of the overall problem. The system characteristics measured include the number of available compute elements available to run threads, multiple memory hierarchy parameters, and functional unit latencies and bandwidths. We show experimental results on a wide range of compute platforms that validate the effectiveness of the overall approach.
Date of Conference: 26-30 Sept. 2011