We are currently experiencing intermittent issues impacting performance. We apologize for the inconvenience.
By Topic

Last level cache (LLC) performance of data mining workloads on a CMP - a case study of parallel bioinformatics workloads

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

The purchase and pricing options are temporarily unavailable. Please try again later.
3 Author(s)
Jaleel, A. ; VSSAD, Intel Corp., USA ; Mattina, M. ; Jacob, B.

With the continuing growth in the amount of genetic data, members of the bioinformatics community are developing a variety of data-mining applications to understand the data and discover meaningful information. These applications are important in defining the design and performance decisions of future high performance microprocessors. This paper presents a detailed data-sharing analysis and chip-multiprocessor (CMP) cache study of several multithreaded data-mining bioinformatics workloads. For a CMP with a three-level cache hierarchy, we model the last-level of the cache hierarchy as either multiple private caches or a single cache shared amongst different cores of the CMP. Our experiments show that the bioinformatics workloads exhibit significant data-sharing - 50-95% of the data cache is shared by the different threads of the workload. Furthermore, regardless of the amount of data cache shared, for some workloads, as many as 98% of the accesses to the last-level cache are to shared data cache lines. Additionally, the amount of data-sharing exhibited by the workloads is a function of the total cache size available - the larger the data cache the better the sharing behavior. Thus, partitioning the available last-level cache silicon area into multiple private caches can cause applications to lose their inherent data-sharing behavior. For the workloads in this study, a shared 32 MB last-level cache is able to capture a tremendous amount of data-sharing and outperform a 32 MB private cache configuration by several orders of magnitude. Specifically, with shared last-level caches, the bandwidth demands beyond the last-level cache can be reduced by factors of 3-625 when compared to private last-level caches.

Published in:

High-Performance Computer Architecture, 2006. The Twelfth International Symposium on

Date of Conference:

11-15 Feb. 2006