On the Security of Quotient Filters: Attacks and Potential Countermeasures

The security of probabilistic data structures is increasingly important due to their wide adoption in many computing systems and applications. In particular, the security of approximate membership check filters such as Bloom or cuckoo filters has been recently studied showing how an attacker can degrade the filter performance in some settings. In this paper, we consider for the first time the security of another popular approximate membership check filter, the Quotient Filter (QF). Our analysis and simulations show that quotient filters are vulnerable to both white and black box attackers that can cause insertion failures and degrade the filter performance very significantly. An interesting finding is that quotient filters are vulnerable to a new type of attack, not applicable to Bloom or cuckoo filters, that can degrade the speed of queries dramatically. The paper also briefly discusses and evaluates potential countermeasures to detect and protect against those attacks.

On the Security of Quotient Filters: Attacks and Potential Countermeasures Pedro Reviriego , Senior Member, IEEE, Miguel González , Niv Dayan , Gabriel Huecas , Shanshan Liu , Senior Member, IEEE, and Fabrizio Lombardi , Life Fellow, IEEE Abstract-The security of probabilistic data structures is increasingly important due to their wide adoption in many computing systems and applications.In particular, the security of approximate membership check filters such as Bloom or cuckoo filters has been recently studied showing how an attacker can degrade the filter performance in some settings.In this paper, we consider for the first time the security of another popular approximate membership check filter, the Quotient Filter (QF).Our analysis and simulations show that quotient filters are vulnerable to both white and black box attackers that can cause insertion failures and degrade the filter performance very significantly.An interesting finding is that quotient filters are vulnerable to a new type of attack, not applicable to Bloom or cuckoo filters, that can degrade the speed of queries dramatically.The paper also briefly discusses and evaluates potential countermeasures to detect and protect against those attacks.Index Terms-Security, approximate membership checking, quotient filters.

I. INTRODUCTION
P ROBABILISTIC data structures also known as data sketches are increasingly used to process big data or high speed data streams [1].There are for example probabilistic data structures to estimate the cardinality of a set [2], the frequency of elements on a set [3], and the similarity between two sets [4].Another function commonly implemented with probabilistic data structures is checking if an element belongs to a set.In this case, the structures are commonly referred to as Pedro Reviriego and Gabriel Huecas are with the ETSI de Telecomunicación, Universidad Politécnica de Madrid, 28040 Madrid, Spain (e-mail: pedro.reviriego@upm.es;gabriel.huecas@upm.es).
Shanshan Liu is with the University of Electronic Science and Technology of China, Chengdu 610054, China (e-mail: ssliu@uestc.edu.cn).
Fabrizio Lombardi is with the Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115 USA (e-mail: lombardi@ece.neu.edu).
Digital Object Identifier 10.1109/TC.2024.3371793filters and return an approximate answer in the sense that false positives occur with a given probability [5].The Bloom filter [6] is the most widely known filter, but many other approximate membership check filters have been proposed over the years to improve performance and cost by for example reducing the number of memory accesses and the memory needed to achieve a given false positive probability [7].Those new filters include the cuckoo filter [8], the xor filter [9], the ribbon [10] and the quotient filter [11], [12].These approximate membership filters are widely used in many application domains including distributed systems [13], bioinformatics [14], networking [15], database systems [16], security [17], or blockchain applications [18].In most cases, the filters are used to accelerate the applications and are fundamental to ensure that the system achieves the desired performance.
The widespread adoption of data sketches has put their security and privacy in the spotlight and recent works have shown that many of them are vulnerable in some settings [19], [20].In particular, in the case of approximate membership check filters, the security and privacy of Bloom filters has been extensively studied showing that both their performance can be degraded and data can be extracted from the filter under certain conditions [21], [22], [23], [24], [25].The security of cuckoo filters has also been studied showing that an attacker can create insertion failures [26].This suggests that other filters may also have vulnerabilities and thus their security must be studied.
To the best of our knowledge, the security of quotient filters has not been previously studied.Quotient filters have been used in software implementations for a wide range of applications that include storage systems [27], packet inspection in networks [28], detection of duplicates [29] or bioinformatics [30], [31].They have also been implemented on GPUs [32] and many variants and derivatives of the quotient filter have been designed to improve its performance or functionality [29], [33], [34].Moreover, the quotient filter has recently been shown as ideal for representing dynamic data sets that can grow or shrink indefinitely [35].Therefore, the study of the security of quotient filters is of interest.In this paper, we pursue the study of their security showing that they are vulnerable to attacks that can degrade their speed by orders of magnitude or cause insertion failures.Based on the results of our security analysis, we also discuss and evaluate potential schemes to protect the filter against the proposed attacks showing their relation to variants of the quotient filter.
The most significant contributions of this paper are as follows: 1) To show that quotient filters performance can be significantly degraded by an attacker.2) To show that insertion failures can be induced in quotient filters by an attacker.3) To propose algorithms to degrade the speed of quotient filters for both white and black box attackers.4) To propose algorithms to create insertion failures on quotient filters for both white and black box attackers.5) To simulate the proposed attack algorithms showing their feasibility.6) To discuss potential protection techniques against the attacks and perform an initial evaluation of their effectiveness.The rest of the paper is organized as follows.Section II briefly presents the quotient filter, its variants and derivatives, adaptive filters and previous works on attacks on filters.The motivation and problem statement are presented in Section III that also discusses the types of attacks considered.Insertion failure attacks are presented in Section IV and speed degradation attacks in Section V.The attacks are evaluated by simulation and the results are summarized in Section VI.Then, potential protection schemes are discussed and evaluated in Section VII.The paper ends with the conclusion and ideas for future work in Section VIII.

A. Quotient Filters
A quotient filter [11], [12] is a hash table with m = 2 x slots.Each key is mapped to a canonical slot using x bits of its hash.The remaining bits of a key's hash are employed as a fingerprint.Each slot stores at most one fingerprint.Fig. 1 Part (A) shows three insertions of different keys, X, Y and Z, to an initially empty quotient filter with eight slots.These keys are hashed and mapped to different canonical slots (000, 101, and 110, all expressed in binary).The rest of these keys' hashes, marked in red, are stored as fingerprints in these slots.
Hash collisions are handled via Robin Hood hashing [36], an open addressing scheme that tries to keep elements close to their canonical position.This means that if the canonical slot for a given key is non-empty, the key's fingerprint will be pushed and stored at some slot to the right.Fingerprints belonging to the same canonical slot are stored contiguously along a so-called run.A cluster is a group of contiguous runs of which the first run begins at its canonical slot and the subsequent runs have been shifted to the right.Part (B) of Fig. 1 shows the outcome of two more insertions of keys V and W, which map to occupied slots 101 and 110, leading to hash collisions.The result is a cluster consisting of two runs, each with two slots.
As shown in Part (B) of Fig. 1, the cluster exceeds the bounds of the quotient filter's eight slots.To accommodate such overflows, the quotient filter has a few buffer slots at its end.It is considered sufficient to allocate Ω(log m) buffer slots as this allows with high probability to accommodate all overflows.The reason is that the maximum run length is, in expectation, ≈ log m log log m , as per the famous balls into bins problem [37].In Fig. 1.A quotient filter stores one fingerprint for each key within a hash table, and it handles collisions by organizing their fingerprints into runs and clusters.Fingerprints are illustrated in red.
the case that a cluster exceeds the bounds of the buffer slots, an insertion failure is returned to the user.
To demarcate the start and end of runs and clusters, there are three metadata flags in each slot.The is_occupied flag is set to one if a given slot is a canonical slot for at least one key.The is_shifted flag is set to one for a slot that contains a fingerprint that had been shifted to the right from its canonical slot.The is_continuation flag is set to one if a slot contains a continuation of a run that started to the left.
For example, in Fig. 1 Part (B), the is_occupied flag of Slot 101 is set to one and the other two flags are set to zero because this slot stores a fingerprint for which Slot 101 is the canonical slot.For Slot 110 and the first buffer slot, the is_continuation and is_shifted flags are set to one because each of these slots is a part of a run that started to the left.For Slot 110, the is_occupied flag is also set to one because it is the canonical slot for a key that had been moved to the right.Slot 111 has the is_shifted flag set to true because it belongs to a cluster starting to the left, yet its is_continuation flag is set to false to mark the start of a new run within this cluster.
A query commences at a given key's canonical slot and checks if the is_occupied flag is set to one.If not, a matching key does not exist and the query terminates.Otherwise, the query moves to the left until it finds the start of a cluster (i.e., a slot with only the is_occupied flag set to true).It traverses this cluster to the right, keeping a running count of the number of subsequent runs we must skip.Each slot to the left of the canonical slot with the is_occupied flags set to true indicates one additional run to be skipped.This increments our running counter.On the other hand, each slot with the is_continuation flag set to false indicates the start of a new run.This decrements the running counter.When the running counter's value is zero, we have reached the target run.The query then scans the run's fingerprints and returns a positive if there is at least one exact match.The time to complete a query depends on the length Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
of the runs and clusters, which are small as long as the filter occupancy stays below a threshold of approximately 80%.
An insertion commences similarly to a query by first finding the run to which the fingerprint should be inserted.The fingerprint is added to this run by shifting all subsequent keys in the cluster one slot to the right, and potentially adding new runs to the cluster in this process by pushing them to the right from their canonical slots.
A quotient filter supports deletes to keys we know had previously been inserted.It executes a delete by identifying a key's run and removing from it a matching fingerprint.It then shifts any subsequent key in the cluster one slot to the left, potentially also splitting clusters by shifting some runs back to their canonical slots.Fig. 1 Part (C) illustrates a delete operation of key Y , which causes the cluster to shrink by one slot.

B. Variants and Derivatives of Quotient Filters (QFs)
The original quotient filter has been improved and extended over the years; for example, the quotient filter has been adapted to perform duplicate detection in data streams [29].The Counting Quotient Filter (CQF) presented in [33] reduces the number of metadata bits from 3 to 2.125.It can also represent multi-sets by efficiently encoding a counter alongside each fingerprint.The counting feature is useful for protecting against one potential attack, as discussed later in this manuscript.The Vector Quotient Filter (VQF) [34] combines the power of two choice to map each key to one of two smaller quotient filters; it performs faster insertions while operating at high load factors.
The classical Quotient Filter and its variants are also bettersuited for representing dynamic data sets than other filters.They support deletes (i.e., unlike xor or ribbon filters); moreover, it is easy to expand a Quotient Filter by scanning it, deriving the original hash for each key by concatenating its canonical slot address with its fingerprint, and inserting this hash into a larger quotient filter with double the capacity [12].This cannot be immediately done with cuckoo or xor filters; also in the context of dynamic data, a hash table typically expands when it is 80 − 90% full, and it returns to 40 − 45% utilization after the expansion.In this range of utilization (40%-90%), a QF performs better than other filters because it only entails one cache miss per operation due to its use of linear probing.In contrast, a cuckoo filter entails two cache misses while a xor filter entails three.The recent InfiniFilter paper [35] capitalizes on these features to construct an infinitely expandable filter on top of QF that provides a good false positive rate and performance guarantees as it expands.
Overall, we observe that a QF has distinct advantages compared to other filters.It is therefore interesting and relevant to also study it from a security perspective.

C. Adaptive Filters
In some applications, the workload is skewed such that queries are concentrated on a small fraction of the elements that are queried many times.For those, it may be beneficial upon detecting a false positive to adapt the filter so that subsequent queries to the same elements return a negative [38].Adaptive filters based on Bloom [39], cuckoo [40] and QFs [38], [41] have been proposed.
Interestingly, the adaptive quotient filters [38], [41] employ techniques that cannot easily be applied in the context of other filters.The Broom filter relies on lengthening fingerprints over which false positives occurred to prevent them from recurring; this requires supporting variable-length fingerprints.A QF supports this capability by storing larger fingerprints across multiple adjacent slots and pushing all other fingerprints to the right through the use of Robin Hood hashing.Such functionality cannot be easily achieved with cuckoo or Xor filters due to their more rigid mapping of entries to buckets.
Similarly, the telescoping QF changes hash function for fingerprints over which false positives occurred and succinctly encodes which hash function was used to generate each fingerprint for chunks of adjacent canonical slots.With a cuckoo filter, this approach would have entailed a prohibitive performance overhead as the information about which the hash function was used would have to be decoded and encoded each time an entry is swapped across its two candidate buckets (potentially many times per a single insertion).Some adaptive filters [38] change the hash functions periodically.As discussed later, this feature helps to mitigate the effects of one of the potential attacks on QFs.Nevertheless, adaptive filters cannot be used generically in all use-cases, because, they require storing a map from each fingerprint to its corresponding key in storage to allow retrieving and rehashing a key over which a false positive occurred.Such map typically requires a significant amount of space to be stored in memory, and it results in significant I/O overheads to maintain in storage for database applications.

D. Security of Approximate Membership Check Filters
The increasing use of approximate membership filters puts their security in the spotlight [22].Bloom filters have been widely studied from a security perspective; they are vulnerable to pollution attacks, which insert elements that set as many bits as possible to ones.This increases the false positive probability [23], [42].Pollution attacks are not applicable to fingerprintbased filters, such as quotient or cuckoo filters.Instead, a cuckoo filter can be attacked by inducing an insertion failure, whereby an element cannot be inserted due to an infinite loop of circular evictions while there is still ample free space [26].This does not apply to Bloom filters as insertions cannot result in infinite loops.Another type of attack on fingerprint filters is to overfill the filter so that it has to be expanded, an operation that may be costly if the original elements have to be read, this can be mitigated using an expandable filter [35].We observe that the types of attacks that are applicable to a given filter depend, on its design.This means that security should be carefully studied for each filter type.
Specifically, the use of linear probing in QFs makes them more susceptible to different types of attacks than Bloom or cuckoo filters [43] because the length of runs and clusters is only bounded in probabilistic terms.This makes possible to attack both the insertion and query performance of a QF by creating Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
large runs and clusters.This contrasts with Bloom filters, for which both query and insertion performance is strictly upper bounded by the number of hash functions used.It is also in contrast to cuckoo filters, in which query performance is strictly upper bounded by only searching two buckets.
In contrast, a type of attack that is generally applicable to all types of filters, is the use of false positive queries.In this case, an attacker first identifies a set of non-existing keys that lead to false positives when they are queried for.Since these keys are all false positives, the filter becomes useless.If the filter is used to prevent searching a slower memory medium for non-existing keys, such an attack can led to many slow memory accesses that may congest and even saturate the system.This type of attack can be mitigated by adaptive filters [38], which can eliminate recurring false positives by adapting fingerprints.As discussed earlier, however, adaptive filters are not universally applicable to all use-cases due to the need for a reverse map, which significantly increases memory requirements or I/O overheads.Another potential mitigation scheme is the use of a cache to store frequently accessed elements; if the set of false positives is smaller than the cache size, it can also protect against such an attack.

A. Motivation
The security of QFs is a timely topic as they are widely used and offer distinct advantages over other types of filters.In Section II-B, we saw that the QF has been used as base method for constructing more advanced filters such as CQF [33], VQF [34], Infinifilter [35], Broom filter [38], and the Telescoping filter [41].Hence, studying the security of a QF also helps to identify vulnerabilities in its many variants and derivatives.The findings on quotient filter security also shed light on the security of Robin Hood hashing and similar open addressing hashing data structures in general [36].
An increasing number of storage and networking systems today are open-source, making their implementation impossible to conceal from attackers.Even for proprietary software in which the implementation is concealed, an attacker may be able to reverse engineer the implementation from its bytecode.When attackers do not have access to the implementation (e.g., it is on a remote secure device), they may be able to induce certain operations such as insertions or deletions on the filter through the application.For example, when a filter is used for network flow monitoring to keep track of flows that exceed a given bandwidth threshold, the attacker can generate flows with sufficiently high traffic to induce a filter insertion.In a database context, an attacker may be able to generate filter insertions by using the application on top (e.g., by creating new users).These examples show that protecting a quotient filter must be considered broadly under different assumptions about the capabilities of possible attackers.

B. Problem Statement
The study of the security of QFs has many dimensions: (1) the quotient filter variant to study, (2) the type of attacks to consider, or (3) the information and access to the filter that the attacker has.Next each of these dimensions are discussed to define the scope of our study.1) Filter Considered: As this is the first study on the security of QFs, we focus on the original QF [12].The reason is twofold.
(1) The original QF is widely applied today, making its security important.(2) It contains design elements that are shared among all its newer variants, which makes its exploration a necessary first step in the study of more advanced variants.A discussion of newer variants is included later.
2) Attacker Models: Two types of control over operations are considered: full and partial.In the first case, the attacker is the only user inserting elements into the filter.In the second case, there are multiple users that insert elements and the attacker has control only over a fraction of the elements inserted.
In terms of the knowledge that the attacker has of the filter, two models are used.The white box model assumes an attacker that knows the filter implementation, parameters, and hash functions used.The black box model assumes an attacker that views the filter as a black box with no internal information.We consider both models and their intersection with partial vs. full control, as shown in Table I.
3) Types of Attack: In Section II-D several types of attacks on filters were discussed.Pollution attacks are not applicable to fingerprint-based filters like the quotient filter.In this work, we consider two types of attack: 1) Insertion failure attacks.
2) Performance degradation attacks.The first type of attack has been studied in the context of cuckoo filters [26].It involves carefully choosing and inserting a few elements that cause an insertion failure.Such an attack is applicable to the original QF, as it allocates a buffer at the end to handle insertions that overflow the original array.Therefore, an attacker can try to generate insertions that overflow this buffer.
As for the second type of attack, an interesting observation is that in QFs the number of positions that have to be checked per operation (query or insertion) is not strictly upper bounded to a small constant.This is in contrast to Bloom filters, for which the operation cost is bounded by the number of hash functions.It is also in contrast to cuckoo filters, which check at most two buckets per query.By limiting the QF occupancy to around 80%, clusters and runs are small on average because the hash values computed on the inserted elements are uniformly distributed.However, an attacker that knows the hash values of elements could potentially insert many elements into a specific part of the filter to generate large runs or clusters that force queries to check hundreds to thousands of positions.This degrades the filter performance and can thus be exploited by an attacker.Therefore, differently from Bloom or cuckoo filters, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Before presenting the proposed attack algorithms, the main notation used in the rest of the paper are summarized in Table II.

IV. INSERTION FAILURE ATTACKS
In this section, the proposed insertion failure attacks are presented for both white and black box attackers.Note that insertion failures have a qualitative effect on the filter as they can enable false negatives so changing the semantics of the filter.This can have important effects on the application that uses the filter which assumes false negatives are not possible.

A. White Box Attacker
For an attacker that knows the filter implementation details, it is computationally easy to generate insertion failures.The attacker can select elements whose hash value maps to the last position in the filter.Then by inserting those elements, eventually the buffer at the end of the filter would overflow causing an insertion failure.This section describes several such attacks.
As we will show, it is harder to protect against attacks caused by inserting multiple unique entries rather than reinserting the same entry multiple times.We therefore carefully distinguish between these cases throughout our analysis.
A plausible attack algorithm is described in Algorithm 1. Assuming that the attacker generates random elements and checks the position they map to, finding an element that maps to the last slot requires on average m tries.Since the buffer, as discussed before, consists of log(m) slots, the complexity of finding log(m) unique attack elements is O(m • log(m)).This is computationally feasible for practical values of m.
In fact, an attacker can do better by looking for unique elements that map to the last log(m) slots and then inserting them.Finding each element takes m/ log(m) tries on average.Hence, finding 2 • log(m) elements to fill up the last log(m) slots and the log(m) buffer slots takes For the above attacks, since the attacker needs only a small (O(log(m))) number of insertions, they are feasible both when the attacker has full control of the insertions and when the attacker controls only a fraction of the insertions.
It is also interesting to discuss the cost of the insertions.Each newly inserted element on the attack sequence has to traverse the run and thus has a cost O(i) with i being the current length of the run.Therefore to insert an attack sequence of l elements we have a cost that is O(l 2 ).Since the length of the run is approximately 2log(m), the cost of the insertions will be O(log(m) 2 ).
The complexity can be further reduced by reinserting the same element multiple times.We can find an element that maps to the last x slots in m/x tries.We can then insert it x + log(m) times to generate an insertion failure.The total cost is m/x + x + log(m) operations.This cost is minimized when x = √ m yielding a complexity of O( √ m).While such an attack is cheaper, we will show that it is easier to protect against as it employs the same key.

B. Black Box Attacker
It seems that a black box attacker cannot determine if an element maps to the last position in the filter (assuming the attacker is not able to learn the hash function).However, the attacker can generate a random element x and insert it many times so that the inserted copies occupy positions with increasing values and eventually overflow the filter.The attack algorithm is described in Algorithm 2. The number of insertions to cause a failure would depend on the position that x maps to.If that position is close to the end, then the number would be small while if it is close to the beginning it would take values close to m.This attack is feasible when the attacker has full control of the insertions, but can have a large cost as the insertions require O(l 2 ) with l being the length of the attack sequence that in this case is O(m) so leading to an overall cost of O(m 2 ).Instead, when the attacker controls only a small fraction of the insertions, the attack can fail depending on the position that the chosen element maps to.The attacker can set a number of insertions so that when it is reached with no failure, the process is repeated with another element to try to maximize the probability of success.

V. SPEED DEGRADATION ATTACKS
In this section, speed degradation attacks are presented.First, the idea is described in general terms and then the specific algorithms for both white and black box attackers are presented.

A. Overall Approach
The proposed attack is based on the observation that differently from other filters such as Bloom or cuckoo, the number of positions checked during a query is not bounded by a small constant.For example, for cuckoo filters, two buckets are checked in the worst case and for Bloom filters, the worst case is given by the number of hash functions used which is small (the optimal value is given by m/n • ln (2) where m is the size of the filter and n the number of elements stored).Instead, in a quotient filter, the number of accesses is bounded by the length of the cluster or run that includes the position that an element maps to.The length of this run or cluster is only bounded in probabilistic terms (and is the main reason to put a limit on the filter occupancy).This means that an attacker can potentially generate long runs or clusters that lead to a large number of positions being checked by queries.
The query process is illustrated in Fig. 2 Part B which shows a query for element Z after inserting four entries in Part A. The query starts by accessing the position that Z maps to and checking the is_occupied bit.Since it is one, it means that at least one element has been inserted in the filter that maps to this position.Therefore, we need to find those elements and compare their fingerprints with that of Z.To do so, we must first move left until the start of the cluster is found (at Slot 010).Once that is done, a search is done to the right to find the run that corresponds to that position, and then the fingerprints in that run are compared with the fingerprint of Z (at Slot 100).As this slot does not have a matching fingerprint, the search continues to the right until reaching a slot that does not belong to the run (i.e., Slot 101) and returns with a negative result to the user.Consider now a query for a different element that maps to the position to the right of that of Z. Since on that position the is_occupied bit is zero, the query ends with a negative result after checking this first position.
The previous examples illustrate what is needed for a query to check many positions on the filter: 1) that the query maps to a position with the is_occupied bit set to one, and 2) that the Fig. 2. Example of a query for element Z in a quotient filter.The search starts on the position that z maps to (marked with the arrow labeled 1) moving left until the start of the cluster is found and then right until the run corresponding to that position is found.
position is part of a long cluster.To create those conditions an attacker can do the following: 1) Find an element that maps to the first position in the filter and insert it.2) Find a second element that maps to the first position in the filter and insert it.3) Find an element that maps to the second position and insert it in the filter, then do the same for the third and subsequent positions until c elements have been inserted.This would create a cluster that starts on the first position and ends on position c + 1 and has the is_occupied bit set on the first c positions.A query to an element that maps to position i smaller than c + 1 would have to first check i positions to the left and then i + 1 positions to the right so 2i + 1 positions in total.The average for elements that map to the first c positions would be to check approximately c positions.Therefore when c is large, the cost of those queries can be also very large.For example, when the attacker controls all the insertions made in the filter, c = n − 1.

B. White Box Attacker
For a white box attacker finding the elements to construct the large cluster as described in the previous section is trivial.Elements are generated randomly and then their hash value is checked to find the ones that have the desired value at each step.The attack algorithm is described in Algorithm 3.
The computational cost of finding each element is on average m so the cost of finding the c attack elements would be O(c • m), which is feasible for values of m used in real applications.The cost of the insertions would be similar to that of the insertion failure black-box attack, O(c 2 ), which again is feasible in real scenarios.
A more restrictive assumption is that the attacker can only insert a fraction of the elements, so c is much smaller than n.In this setting, some of the other insertions would also map to the first c + 1 positions.This will tend to enlarge the cluster that the attacker is creating.Assuming n c, the length of the run would increase to approximately

C. Black Box Attacker
For a black box attacker, it does not seem to be possible to find the elements required to construct the long cluster described at the beginning of this section.However, an attacker can insert the same element many times thus creating a long run.The issue is that the positions in the run would have the is_occupied bit set to one only on the first position.Therefore, only queries that map to that position would require checking the elements in the run while queries that map to the rest of the positions would return a negative immediately.
To overcome that limitation, the attacker can insert elements randomly such that some of them will fall on the run and set the is_occupied bit of some positions thus increasing the query time for elements that map to them.This is what would happen anyway when the attacker controls only a fraction of the insertions as described in the previous subsection.Therefore, the attacker can still degrade the query speed significantly, but the effectiveness will depend more on the occupancy of the filter.The algorithm when the attacker controls only a fraction of the insertions is described in Algorithm 4 when the attacker performs in total c insertions.
The cost of this attack is similar to that of the black-box insertion attack, O(c 2 ), which is feasible in real scenarios.

VI. EVALUATION
In this section, the proposed attacks are evaluated by simulation.A quotient filter implementation in Java is used in all the experiments that are run on an Intel Xeon processor running at 2.6GHz.The filter implementation and the code to replicate the experiments in available in a public repository 1 .

A. Insertion Failures 1) White Box Attack:
In these experiments, we set the filter size to m = 2 16 , 2 17 , 2 18 , 2 19 , 2 20 buckets and the fingerprint size to f = 13 bits so that each bucket has 16 bits.We run the white box attack described in Algorithm 1.As expected, an insertion failure occurs exactly after 2log 2 (m) + 2 insertions as the buffer at the end of the filter has a size of 2log 2 (m).The results are shown in Fig. 3, note that the x-axis is on a logarithmic scale (the same applies to the rest of the figures in this section).
2) Black Box Attack: In a second experiment, we run the black box attack described in Algorithm 2 for filters of the same sizes.In this case, 1000 trials were run and the average number of insertions needed to cause a failure was computed.The results are shown in Fig. 4 and are again in line with the expected value of m/2 + 2log 2 (m).The black box attacker can also create insertion failures but needs a much larger number of insertions that the white box attacker.In any case, these results confirm the ability of an attacker to create insertion failures on quotient filters.
Finally, we study the dependency of the execution time with the filter size.The results are summarized in Fig. 5 that confirm the quadratic dependency as predicted by the theoretical analysis in Section IV.

B. Speed Degradation
To evaluate the impact on the query speed, we first insert elements on the filter to reach a given occupancy level o and then  perform queries random elements and measure the average query time.The number of random queries is ten times the size of the filter.
1) White Box Attacks: As in previous experiments we use filters of sizes m = 2 16 , 2 17 , 2 18 , 2 19 , 2 20 and set the fingerprint size to f = 13 bits.Additionally, we consider an attacker that controls 1%, 5%, and 10% of the insertions and a target occupancy of o = 0.8.For those settings, we run the white box attack described in Algorithm 3. The results averaged over 1000 runs are shown in Fig. 6 which also shows the average query time for a filter with no attack.It can be observed that the attack is able to dramatically increase the query time.The degradation of the query time is larger with the filter size m and also with the fraction of insertions controlled by the attacker.This is expected as both the filter size and the fraction of insertions controlled by the attacker will increase the length of the cluster created  by the attack.The dependency with the filter size is approximately linear so doubling the size doubles the query time.The dependency with the percentage of elements controlled is more complex as both the cluster length and the probability of a query hitting the cluster increase creating a stronger dependency.Fig. 7 shows the ratio of the query time versus the attack free filter.It can be seen that even when the attacker controls only 1% of the insertions, a degradation of more than one order of magnitude is induced on the query time by the attacker.
In a second experiment, we run the same filters with occupancy of o = 0.2, 0.4, 0.6, 0.8 for an attacker that controls 1% of the insertions.The results are shown in Fig. 8 for the ratio of the average query time with and without attack and show how the attack is less effective at low occupancy.This is because the elements present in the filter enlarge the size of the cluster created by the attacker.This is in fact a recursive Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.increase as enlarging the cluster means that more elements are added to it, so adding in total o + o 2 + o 3 + ... = (o/1 − o) elements to the cluster.This effect amplifies the impact of the attack at high occupancy but the attack is effective regardless of the occupancy.
2) Black Box Attacks: For the black box attacker, we use the same settings as in the white box attack and we run the black box attack described in Algorithm 4. The results are summarized in Fig. 9.It can be observed that the attack is also able to dramatically increase the average query time.The degradation is better seen in Fig. 10 which shows the ratio of the average query time under attack and the time with no attack.This degradation increases both when the attacker controls a larger fraction of the insertions or when the filter is larger.As in the case of the black-box attack, the dependency with the filter size is approximately linear so that doubling the size also doubles the query time when the filter is attacked.Instead, the relationship between the percentage of insertions controlled by  the attacker and the query time is more complex.This is because on the one hand, increasing the percentage increases the length of the cluster created by the attacker while on the other the elements inserted by the attacker do not set the "is occupied" bit to one so reducing the number of positions for which the cluster needs to be checked.Finally, the effectiveness of the attack is lower than for a white box attacker as expected.
As for the white-box attract, in a second experiment, we run the same filters with occupancy of o = 0.2, 0.4, 0.6, 0.8 for an attacker that controls 1% of the insertions.The results are shown in Fig. 11 for the ratio of the average query time with and without attack and show how the attack is less effective at low occupancy.However, even in that scenario, a significant speed degradation is introduced by the attacker.
In summary, the evaluation results confirm the effectiveness of the speed degradation attacks even for moderate filter sizes (the largest size simulated is 2 20 ).For larger filter sizes, such as those commonly used in storage applications [27], the degradation would be significantly larger.

VII. POTENTIAL PROTECTION SCHEMES
This section briefly discusses and evaluates 2 schemes that can be used to detect protect quotient filters.A detailed study of such schemes is left for future work.

A. Detection
The first thing to do is to detect the attack, for example by monitoring the length of the largest cluster in the filter.This can be done for example during element insertion.When that length is much larger than expected, it is highly likely that there is a problem, possibly an attack.In addition to the maximum length, the average length or other relevant metrics can also be monitored.For insertion failure attacks, the space left in the buffer at the end of the filter can be monitored to detect attacks, as under normal operation it should not overflow.
In some applications, the number of multiple insertions of the same element is limited by construction, for example when an element can be stored in l different levels but only once per level, so can be inserted at most l times.In those cases, when the number of fingerprints with the same value that map to a given canonical position exceeds l is a sign that something unexpected is happening.

B. Protection
Several techniques can be used to protect the filter against the proposed attacks.Interestingly, some of them are already implemented in variants or derivatives of the quotient filter for other reasons.
1) Black Box Attacks: A black box attacker relies on inserting the same element multiple times to create long runs that cause an insertion failure or speed degradation.To protect against a black box attacker insertions of elements that map to the same position and have the same fingerprint (so they are indistinguishable to the filter) can be stored using a counter.This is done in the counting quotient filter (CQF) [33] to support multiple insertions of the same element.Therefore, a side benefit of CQF is that it protects against a black box attacker.
To validate the effectiveness of the counters against the proposed black box attacks, a counter has been implemented in our QF.So, an invalid combination of the last two metadata bits 3 is used to signal that the next slot to the right contains a counter.The counter may extend over several slots and the fingerprints of these slots are used to represent the count as an unsigned integer.Therefore, values up to 2 c•f can be coded when using c slots for the counter.Then, the insertion and speed degradation of black box attacks have been run using the same settings as in the evaluation section: filter sizes m = 2 16 , 2 17 , 2 18 , 2 19 , 2 20  with f = 13.The insertion attack inserts the same element ten times the number of slots in the filter; in all cases, no insertion 2 The code for the protection schemes is also available in the github repository.
3 is_continuation = 1 and is_shifted = 0. failure occurred.The speed degradation attack results are summarized in Fig. 12 which shows the ratio of the query times with and without the attack; it can be observed that the query speed is not degraded.In fact, queries are faster when the filter is under attack and more as the percentage of insertions controlled by the attacker increases.This occurs because with the counter, the insertions of the attacker only increase a counter but they do not use additional slots, and thus the overall effect of the attack is a reduction of the number of slots occupied and thus, the length of runs and clusters.This confirms the effectiveness of the counter in protecting against black box attackers that insert the same element many times.

2) White Box Attacks:
To protect against a white box attacker, a possible scheme is to monitor the length of the clusters and when a large cluster is detected, reconstruct the filter.This is in fact done in some adaptive filters such as the Broom filter [38].In the Broom filter, each time an adaptation is done, additional bits are added, to avoid those bits growing with adaptations, the filter is also continuously rebuilt using a different hash function.A side benefit of this mechanism is that it would also protect against a white box attacker.However, the filter reconstruction requires access to the original elements, something that may not be possible in some quotient filter applications as discussed in Section II.
Assume we do not have access to the original elements, then the reconstruction of the filter can be implemented by swapping some bits between the quotient and the fingerprint in an attempt to break these long runs.This approach in addition to being more generally applicable is also faster as no hash recomputations are needed.Assume that each element is mapped to a hash with q + f bits of which initially the upper q are used for the quotient and the lower f for the fingerprint; then, we can cyclically shift a few bits.That would change the positions of the elements and spread the elements breaking the runs.This is illustrated in Fig. 13 for two elements with q = 8 quotient bits and f = 4 fingerprint bits.Originally the two elements map to adjacent slots, after the shift, they map to independent slots.To  enable a larger number of reconstructions, we can use a Linear Feedback Shift Register (LFSR) [44] on the q + f bits to shift bits to the right a few positions to obtain a new hash value.As the LFSR output bits look pseudorandom the values tend to be different for elements that mapped to the same position in the previous construction and thus, it tends to spread elements that were in the same run into different slots.This breaks the long runs and clusters generated by a white box attacker.The LFSR bit shifting protection scheme has been implemented and tested for white box insertion and speed degradation attacks, each reconstruction introduces a shift of q bits.For insertion it was assumed that the attacker also knows the LFSR implementation and thus, it can cause another insertion failure after reconstruction.The simulation was stopped after 10 reconstructions showing that the protection scheme is able to handle multiple insertion failures.For speed degradation attacks, the results are shown in Fig. 14.It shows the ratio of the query times with and without the attack; the ratio is close to one in all cases, so the attack has no noticeable impact on query speed.The absolute values of the query times are larger than prior to the reconstruction due to the overhead introduced by the LFSR bit shifting which is in any case much smaller than the degradation observed in the unprotected implementation.These results confirm the effectiveness of this strategy in protecting against a white box attacker.3) Insertion Failure Attacks: Another alternative to protect against insertion failures for both white and black box attackers could be to wrap around insertions that overflow the buffer.In this case, an insertion fails only when all slots in the filter are used and thus its occupancy is 100%; this is detected as an abnormal situation because the filter is designed to work at most at 80% occupancy as discussed previously.
The protection techniques discussed are summarized in Table III showing for each attacker model and attack type the protection schemes that are applicable.Note that a white box attacker can also typically perform black box attacks and thus techniques that protect against both types of attacks, will be needed.

VIII. CONCLUSION AND FUTURE WORK
This paper has for the first time studied the security of quotient filters showing that an attacker can create insertion failures and also degrade the filter speed dramatically.This can be done by both white and black box attackers.Therefore, quotient filters are vulnerable to attacks and their security should be carefully considered.
The paper also discusses potential schemes to detect and protect against the attacks.The detailed study and evaluation of these attack mitigation techniques is left for future work.Another interesting area for future work is the study of the security of variants of quotient filters.

c 1 −Algorithm 3 Algorithm 4
n/m making the attack Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Speed degradation white box attack l = 0 Finish = false while Finish == false do Generate a random element x if q(x) == 0 then Insert element in the filter Finish = true end if while Finish = false while Finish == false do Generate a random element x if q(x) == l then Insert element x in the filter l = l + 1 if l == c then Finish = true end if end if end while Speed degradation black box attack Generate a random element x for i == 1 to c do Insert element x in the filter end for more effective.However, only a fraction of those positions will have the is_occupied bit set and thus would need to access many positions.

Fig. 3 .
Fig. 3. Number of insertions to cause an insertion failure for the white box insertion failure attack.

Fig. 4 .
Fig. 4. Average number of insertions to cause an insertion failure for the black box insertion failure attack.

Fig. 5 .
Fig. 5. Average execution time over to complete the insertion failure black box for different filter sizes.

Fig. 6 .
Fig. 6.Average query time for the white box speed degradation attack when the attacker controls 10%, 5%, or 1% of the insertions.

Fig. 7 .
Fig. 7. Ratio of average query time versus no attack for the white box attacker when the attacker controls 10%, 5%, or 1% of the insertions.

Fig. 8 .
Fig. 8. Ratio of average query time versus no attack for the white-box speed degradation attack with different filter occupancy when the attacker controls only 1% of the insertions.

Fig. 10 .
Fig. 10.Ratio of average query time versus no attack for the black-box attacker when the attacker controls 10%, 5%, or 1% of the insertions.

Fig. 11 .
Fig. 11.Ratio of average query time versus no attack for the black box speed degradation attack with different filter occupancy when the attacker controls only 1% of the insertions.

Fig. 12 .
Fig. 12. Ratio of average query time versus no attack for the black-box attacker when counter protection is used and the attacker controls 10%, 5%, or 1% of the insertions.

Fig. 13 .
Fig. 13.Illustration of the bit shifting scheme to protect against white box attackers.The hash bits are cyclically shifted four positions to the right.

Fig. 14 .
Fig. 14.Ratio of query time versus no attack for the black-box attacker when LFSR reconstruction is used and the attacker controls 10%, 5%, or 1% of the insertions.

Pedro Reviriego (
Senior Member, IEEE) received the M.Sc.and Ph.D. degrees in telecommunications engineering from the Technical University of Madrid, Madrid, Spain, in 1994 and 1997, respectively.From 1997 to 2000, he was an Engineer with Teldat, Madrid, working on router implementation.In 2000, he joined Massana to work on the development of Ethernet transceivers.From 2004 to 2007, he was a Distinguished Member of Technical Staff with the LSI Corporation, working on the development of Ethernet transceivers.From 2007 to 2018, he was with Nebrija University and from 2018 to 2022 with Universidad Carlos III de Madrid.He is currently with the Technical University of Madrid.Miguel González received the B.Sc. degree in telematics engineering, the M.Sc.degree in cybersecurity, and the M.Sc.degree in telecommunications engineering, all from Universidad Carlos III de Madrid, Spain, in 2020, 2022, and 2023 respectively.He is currently a Researcher with Universidad Carlos III de Madrid, where he works mainly on cybersecurity related issues.His achievements include honorable mentions for his B.Sc. and M.Sc.theses, one of which was a reference implementation of a future protocol for IEEE 802.Niv Dayan received the M.Sc.and Ph.D. degrees from the IT University of Copenhagen.He is an Assistant Professor with the University of Toronto.Prior to that, he was a Postdoctoral Researcher with Harvard and with Copenhagen University.His research interests include the design and analysis of storage engines and their core data structures.Gabriel Huecas is a Doctor on Telecommunication Engineering with Universidad Politécnica de Madrid, in 1995, and an Associate Professor with the Technical University of Madrid.Since 2021, he was a Deputy Director with the International Doctorate School of UPM.Since 1988, involved in specification languages, compilation tools, and software engineering; since 1995, involved in research and development tasks on collaborative environments, multimedia applications, and protocols for multimedia distribution and streaming.Since 2010, involved in Big Data and virtualization technologies, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
In sum, this paper "studies the security of the original quotient filter against attackers that want to create insertion failures or performance slowdowns.The attackers can have black-box or white-box knowledge of the filter's implementation and control all or just a fraction of the operations."

TABLE III PROTECTION
TECHNIQUES FOR THE DIFFERENT ATTACKER MODELS AND ATTACK TYPES