Simultaneous multithreading (SMT) is emerging as an effective microarchitecture model to increase the utilization of resources in modern super-scalar processors. However, co-scheduled threads often aggressively compete for certain limited resources, among the most important of which is space in the cache hierarchy. Rather than require future systems to have more cache resources, performance-aware scheduling techniques can be used to adapt thread scheduling decisions and minimize this inter-thread contention for cache resources. Although many processors currently have the ability to summarize the activity in each cache level, systems that monitor and collect detailed information about cache access behaviors can enable scheduling algorithms to fully exploit multithreaded cache workload characteristics in different cache regions. This paper explores the design of a novel fine-grained hardware monitoring system in an SMT-based processor that enables improved system scheduling and throughput.