Skip to Main Content
The market for cloud backup services in the personal computing environment is growing due to large volumes of valuable personal and corporate data being stored on desktops, laptops and smart phones. Source deduplication has become a mainstay of cloud backup that saves network bandwidth and reduces storage space. However, there are two challenges facing deduplication for cloud backup service clients: (1) low deduplication efficiency due to a combination of the resource-intensive nature of deduplication and the limited system resources on the PC-based client site, and (2) low data transfer efficiency since post-deduplication data transfers from source to backup servers are typically very small but must often cross a WAN. In this paper, we present AA-Dedupe, an application-aware source deduplication scheme, to significantly reduce the computational overhead, increase the deduplication throughput and improve the data transfer efficiency. The AA-Dedupe approach is motivated by our key observations of the substantial differences among applications in data redundancy and deduplication characteristics, and thus is based on an application-aware index structure that effectively exploits this application awareness. Our experimental evaluations, based on an AA-Dedupe prototype implementation, show that our scheme can improve deduplication efficiency over the state-of-art source-deduplication methods by a factor of 2-7, resulting in shortened backup window, increased power-efficiency and reduced cost for cloud backup services.