Skip to Main Content
This paper proposes a Dynamic Bloom filter Array (DBA) to represent membership for variable large data sets in storage systems in a scalable way. DBA consists of dynamically created groups of space-efficient Bloom Filters (BFs) to accommodate changes in set sizes. In each group, BFs are homogeneous and the data layout is optimized at the bit level, so that they can be accessed in parallel to achieve high query performance. DBA can effectively control its query accuracy by partially adjusting the error rate of constructing BFs, where each BF corresponds to an independent subset of the data set to facilitate element location and membership confirmation. Further, DBA supports element deletion by introducing a lazy update policy. We prototype and evaluate our DBA scheme as a scalable fast index in the MAD2 deduplication storage system. Experimental results show that DBA (with 64 BFs per group) is capable of maintaining 90% of the peek query performance while scaling up to 160 BFs. DBA is also shown to excel in performance and space efficiency by theoretical analysis and other experiments based on real-world data sets.