Skip to Main Content
An efficient design for a distributed file system originates from a deep understanding of common access patterns and user behavior which is obtained through a deep analysis of traces and snapshots. In this paper we analyze traces for eight distributed file systems that represent a mix of workloads taken from educational, research and commercial environments. We focused on characterizing block access patterns, amount of block sharing and working set size over long periods of time, and we tried to find common behaviors for all workloads that can be generalized to other storage systems. We found that most environments shared large amounts of blocks over time, and that block sharing was significantly affected by repetitive human behavior. We also found that block lifetimes tended to be short, but there were significant amounts of blocks with long lifetimes that were accessed over many consecutive days. Lastly, we determined that most daily accesses were made to a reduced set of blocks. We strongly believe that these findings can be used to improve long-term caching policies as well as data placement algorithms, thus increasing the performance of distributed storage systems.