Skip to Main Content
With the advent of Chip Multiprocessors (CMPs), improving performance relies on the programmers/compilers to expose thread level parallelism to the underlying hardware. However, this is a difficult and error-prone process for the programmers, while state of the art compiler techniques are unable to provide significant benefits for many classes of applications. An alternative is offered by systems that support Thread Level Speculation (TLS), which relieve the programmer and compiler from checking for thread dependences and instead use the hardware to enforce them. Unfortunately, TLS suffers from power inefficency because data misspeculations cause threads to roll back to the beginning of the speculative task. For this reason intermediate check-pointing of TLS threads has been proposed. When a violation does occur, we now have to roll back to a checkpoint before the violating instruction and not to the start of the task. However, previous work omits study of the microarchitectural details and implementation issues that are essential for effective checkpointing. In this paper we study checkpointing on a state-of-the art TLS system. We systematically study the costs associated with checkpointing and analyze the tradeoffs. We also propose changes to the TLS mechanism to allow effective checkpointing. Further, we establish the need for accurately identifying points in execution that are appropriate for checkpointing and analyze various techniques for doing so in terms of both effectiveness and viability. We propose program counter based and hybrid predictors and show that they outperform previous proposals. Placing checkpoints based on dependence predictors results in power improvements while maintaining the performance advantage of TLS. The checkpointing system proposed achieves an energy saving of up to 14%, with an average of 7% over normal TLS execution.