Skip to Main Content
Reliability in deduplication storage has not attracted much research attention yet. To provide a demanded reliability for an incoming data stream, most deduplication storage systems first carry out deduplication process by eliminating duplicates from the data stream and then apply erasure coding for the remaining (unique) chunks. A unique chunk may be shared (i.e., duplicated) at many places of the data stream and shared by other data streams. That is why deduplication can reduce the required storage capacity. However, this occasionally becomes problematic to assure certain reliability levels required from different data streams. We introduce two reliability parameters for deduplication storage: chunk reliability and chunk loss severity. The chunk reliability means each chunk's tolerance level in the face of any failures. The chunk loss severity represents an expected damage level in the event of a chunk loss, formally defined as the multiplication of actual damage by the probability of a chunk loss. We propose a reliability-aware deduplication solution that not only assures all demanded chunk reliability levels by making already existing chunks sharable only if its reliability is high enough, but also mitigates the chunk loss severity by adaptively reducing the probability of having a chunk loss. In addition, we provide future research directions following to the current study.