Scheduled System Maintenance:
Some services will be unavailable Sunday, March 29th through Monday, March 30th. We apologize for the inconvenience.
By Topic

Compiler-Assisted Data Distribution and Network Configuration for Chip Multiprocessors

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Yong Li ; Comput. Eng. Program, Univ. of Pittsburgh, Pittsburgh, PA, USA ; Abousamra, A. ; Melhem, R. ; Jones, A.K.

Data access latency, a limiting factor in the performance of chip multiprocessors, grows significantly with the number of cores in nonuniform cache architectures with distributed cache banks. To mitigate this effect, we use a compiler-based approach to leverage data access locality, choose an optimized data placement and efficiently configure the on-chip network. The proposed experimental compiler framework employs novel compilation techniques to discover and represent multithreaded memory access patterns (MMAPs). At runtime, symbolic MMAPs are resolved and used by a partitioning algorithm to choose a partition of allocated memory blocks among the forked threads in the analyzed application. This partition is used to enforce data ownership by associating the data with the core that executes the thread owning the data. Based on the partition, the communication pattern of the application can be extracted. We demonstrate how this information can be used in an experimental architecture to accelerate applications. In particular, our compiler assisted data partitioning approach shows a 20 percent speedup over shared caching and 5 percent speedup over the closest runtime approximation, first touch. By leveraging the communication pattern we can achieve a comparable performance to a system that uses a complex centralized network configuration system at runtime. Thus, our final system saves significant runtime complexity and achieves an 5.1 percent additional speedup through the addition of the reconfigurable network.

Published in:

Parallel and Distributed Systems, IEEE Transactions on  (Volume:23 ,  Issue: 11 )