The management of multi-level caching hierarchy is a critical and challenging task. Although there exist many hardware and OS-based schemes, they are difficult to be adopted in practice since they incur non-trivial overheads and high complexity. In order to efficiently deal with this challenge, we propose MERCURY, a cost-effective and lightweight hardware support to coordinate with OS-based cache management schemes. Its basic idea is to leverage data similarity to reduce data migration costs and deliver high performance. Moreover, in order to accurately and efficiently capture the data similarity, we propose to use low-complexity Locality-Sensitive Hashing (LSH). In our design, in addition to the problem of space inefficiency, we identify that a conventional LSH scheme also suffers from the problem of homogeneous data placement. To address these two problems, we design a novel Multi-Core-enabled LSH (MC-LSH) that accurately captures the differentiated similarity across data. The similarity-aware MERCURY hence efficiently partitions data into L1 cache, L2 cache and main memory based on their distinct localities, which help optimize cache utilization and minimize the pollution in the last level cache. Experiments through real-world benchmarks further corroborate the efficacy and efficiency of MERCURY.