Skip to Main Content
Efficiently operating on relevant data for users in large-scale online social network (OSN) systems is a challenging problem. Storage systems used by popular OSN systems often rely on key-value stores, where randomly partitioning the data of users among servers across the data centers is the defacto standard. Although by using DHTs, the random partition scheme is highly scalable for hosting a large number of users, it leads to costly inter-server communications across data centers due to the complexity of interconnection and interaction between OSN users. In this paper, we explore how to reduce the inter-server communications by retaining the simple and robust nature of OSNs. We propose a data placement solution atop OSN systems to divide users among servers according to the interaction-locality-based structure. Our approach exploits a simple, yet powerful principle of OSN interactions, self-similarity, which reveals that the inter-server communication cost is minimized under such intrinsic structure. Our algorithm avoids a significant amount of inter-server traffic as well as achieves load balance among servers across the data centers. We demonstrate the existence of self-similarity in large-scale Facebook traces including 10 million Facebook users and 24 million interaction events. We conduct comprehensive trace-driven simulations to evaluate this design exploiting the unique feature of self-similarity. Results show that our scheme significantly reduces the traffic and latency of the existing schemes.