Skip to Main Content
An appropriate automatic thread decomposition approach is critical for pipelined multithreading (PMT) to maximize pipeline performance with balanced thread size on target multi-core processor. This paper presents an automatic thread decomposition approach, which maps the pipeline thread decomposition problem onto a graph-theoretic framework to construct an optimized DAG with minimal bottleneck node size and balanced node size under constrained core number. In this approach, control dependence is treated as special data dependence and then an effective mechanism is proposed to remove redundant control dependences. A heuristic decomposition algorithm is given to generate an optimized pipeline. The algorithm has been evaluated on a commodity multi-core processor, and experimental results show that it has achieved speedup ranging from 113% to 174% on several SPEC CPU 2000 benchmark programs.