Using cycle stacks to understand scaling bottlenecks in multi-threaded workloads | IEEE Conference Publication | IEEE Xplore