Skip to Main Content
As application programs become large and complex year by year, mapping these programs into highly parallelized hardware resources by understanding all of the program structures across an application is becoming harder and harder for programmers. In this paper, we present an advanced technique that can extract dynamic memory dataflow across whole program execution to provide programmers with hints for program parallelization and acceleration. Using a pre-compiled executable binary code as the input, we monitor data dependencies via memory references together with dynamic loop- and call-contexts. We implement our mechanism on a dynamic binary translation system and evaluate it using productive benchmark suites. From these results, we confirm that we can successfully keep track of data dependencies among function calls, loops and their iterations using LCCT+M (Loop-Call Context Tree with Memory Dataflow) representation within reasonable time and memory space overheads. We also demonstrate our profiling can help programmers to identify loop-, task- and pipeline-parallelisms in its actual dynamic execution.
Date of Conference: 4-6 Nov. 2012