Towards In-Network Compact Representation: Mergeable Counting Bloom Filter Vis Cuckoo Scheduling

With the breakthrough of edge intelligence, we are witnessing a booming increase in distributed applications on edge nodes. These distributed applications need to apply a novel data representation algorithm to support data-information exchanging and data-information based decision among different edge nodes. As the most efficient data compact representation algorithm, Counting Bloom Filter (CBF) is an extension of Bloom filter, which enables updating data representation as well as inserting data into a representation. To facilitate distributed applications on edge nodes, edge nodes need to exchange and summarize the information of the data collected from different edge nodes. Impossible to merge with other CBFs, the existing CBF and its variants thus cannot be used for representing and exchanging data information among edge nodes. To handle this problem, we design a novel mergeable CBF, mergeCBF. Based on an insight about the counting processing of a CBF, we unfold the counter array of the conventional CBF to a group of bit arrays, and in order to support merging multiple filters, map each inputted item to the cells in this group of cuckoo-scheduled bit arrays instead of the counters in CBF. Experiments on real-world datasets demonstrate that mergeCBF can support conventional operations and merging operations in an efficient way without degrading the quality of the representation results.

on the representation results. To accurately summarize data, the stale data should be removed from the data representation results periodically. Therefore, CBF has been the primary tool used in Internet to achieve efficient data representation.
With the breakthrough of edge computing, we are witnessing a booming increase in distributed in-network applications. These applications leverage distributed edge nodes instead of data center servers or terminal devices to handle massive data. To felicitate these applications to analyze data in a distributing way, novel general data representation schemes are highly desired for identifying and summarizing data on different edge nodes, which poses a new challenge on whether the data representation results can be efficiently merged with each other. Fulfilling the challenges is difficult for almost all of the existing compact data representation approaches, including CBF and its variants. Among other types of data compact representation algorithms, only quotient filter [18] and its variant [19] can support the merging operation, which however facilitates the hash table-based data representation on solid-state drive (SSD) better than only a general BF-like hyper-compact representation on high-speed random access memory (RAM). The detailed comparison of the most popular compact representation algorithms is present in Table 1.
To provide a mergeable counting bloom filter for common high-speed random access memory (RAM), we study the counting processing of a CBF, and obtain an insight that the reason why CBF cannot support merging operations is that CBF does not record any other information of the inputted data items but its hit count on some specific counters in CBF, and thus CBF cannot identify the repeated data items inserted in the different candidate filters to be merged. According to this insight, we propose a novel design by unfolding the counter array of CBF to a group of bit arrays, and cuckoo schedule these bit arrays in a fixed sequence to be used. In this way, we can find the replicated items that has been inserted into different candidates, and only keep one of them into the merging result. The operations on basic bit arrays are simple enough to make our design more efficient that the other existing approaches.
Contributions: 1) We study the counting processing of a CBF, and address the reason why CBF cannot support merging operations. The reason is that CBF does not record any other information of the inputted data items but its hit count on some specific counters in CBF. Thus, CBF cannot identify the repeated data items inserted in the different candidate filters to be merged, and these replicated items cause the overflow of the counters of CBF.
2) A novel design of mergeable counting bloom filter, mergeCBF, is proposed by unfolding the counter array of CBF to a group of bit arrays, and cuckoo schedule these bit arrays in a fixed sequence to be used. Ultimately, the information of the inserted items are recorded in different bit arrays, and thus the information of the replicated items will be aggregated into the existing record, which enables the merging operations on multiple filters. 3) We implement extensive experiences based on realworld data. The experimental results demonstrates our mergeCBF can achieve efficient merging operations among multiple filters as well as accurate query and updating operations.
We provide the preliminary of compact representation and approximate membership query, and define the problem in Section II. In Section III, a mergeable counting bloom filter is proposed according to an insight about the counting process of CBF. Based on the insight, we take multiple bit arrays to record the repeated counting times when items are inserted. The performance of our design is evaluated in Section V. We study the related work and conclude our work in Section VI and VII.

II. PRELIMINARY AND PROBLEM STATEMENT A. COMPACT REPRESENTATION AND APPROXIMATE MEMBERSHIP QUERY
The huge amount of existing data in Internet makes data representation and exact matching data queries costly. The brute-force searching to respond the exact-matching query can cause extremely high cost on computational and storage resources. In contrast, compact data representation and approximate membership queries can relax constraints on user requests to take much shorter time for users to get satisfactory results. In fact, the data representation and membership queries are not a new problem but become crucial to existing and possible future in-network data analysis at edge. Considering the limitation of resources at edge nodes, the data representation and membership query should be lightweight and efficient enough to avoid overwhelming edge nodes.
As an extension of the most popular compact representation, Bloom filter (BF) [9], Counting Bloom filter (CBF) [12] consists of an array of counters and multiple hash functions. It extend each one-bit cell in Bloom filter to a multiplebit counter to support deleting data as well as inserting and query data in the representation result. Expect for that, when the expected number of the inserted data items equals to n, CBF is completely equivalent to BF and can be configured 55330 VOLUME 9, 2021  according to the configuration method of the basic Bloom filter, including the number of cells (m), the number of hash functions (k), etc [9].
In this way, CBF can achieve a compact representation of the inserted data items without involving a large false positive rate (ε).
The detailed inserting and deleting operations of a CBF are depicted in Figure 1. To insert a data item into a CBF, this item is hashed by function h 1 to h k to k different counters, and these counters will increase by 1 to record this item. When we delete a data item from a CBF, this item is also hashed by function h 1 to h k to k different counters, and these counters will decrease by 1 to remove the information of this item. There is an upper bounded that the maximum hit times of a counter in CBF is greater than a specific integer [12], and each counter and the size of each cell of CBF (bit string length l) can be configured according to this upper bound. A 4-bit counter can meet the common needs of a CBF, and with a high probability, it is guaranteed that the counter of any cell will not exceed its maximum counting range. Note that the upper bound is obtained under an assumption that the repeated data items will not be inserted into a CBF.

B. PROBLEM STATEMENT AND ANALYSIS
To support in-network data information exchanging and datainformation based decision among different edge computing nodes, the promised data compact representation structures should be feasible to be merged with each other as well as being updated. CBF satisfies the requirement of edge nodes for updating their data representation results when these nodes rely on CBF to exchange the information of in-network data. However, a CBF cannot be merged with other CBFs. CBF need to extend functions for merging multiple CBFs to support in-network data information exchanging. It can be concluded that we need to design a novel CBF according to the following guidelines: 1) Support inserting, deleting and query based on CBF.
2) Extend CBF to facilitate merging multiple filters.
The way for the CBF with a counter array to support the merging operation on multiple filters is to add up the value of the counters in the same position of different CBFs. This straightforward way does not work due to the problem illustrated in Figure 2. When a data item is inserted in a CBF, the CBF will increase counters to which the inserted item is mapped, but no other information of this item is recorded. Therefore, there is no means to identify whether an item has been inserted into both CBF1 and CBF2. On an assumption that no replicated item will be inserted in a CBF, the common size of the counters are 4 bits [12], the maximum value of the counters are 15. When CBF1 an CBF2 are merged, ∀i, and the value of the i − th counter in CBF1 will be add up to that of the i − th counter in CBF2. The replicated data items inserted in both CBF1 and CBF2 cause that a part of adding results will exceed the maximum value of the counters, thus leading to errors. Therefore, to achieve a mergeable counting bloom filter, we need to handle the problem how to distinguish the records of the same data items inserted into different filters. We should eliminate the repeated records of a data item in the merging result, and thus avoid overflowing the counting range of the filter.

III. IN-NETWORK COMPACT REPRESENTATION
Towards efficient and lightweight data pre-processing and analyzing at the resource-limited edge nodes, we propose an in-network compact representation scheme for a general purpose, mergeable Counting Bloom Filter (mergeCBF). In detail, we first study the counting process of the conventional CBF and obtain an insight how to eliminate replicated items inserted in different filters. Ultimately, we find a way to design a mergeable counting bloom filters.

A. DESIGN OVERVIEW
As discussed in Section II-B, a conventional CBF cannot merge with other CBFs to obtain a merging result due to the lack of indispensable information to distinguish and eliminate the repeated records of the same data items inserted into different filters. Ultimately, some counters in the merging result will overflow the counting range.
On the other hand, we notice that a basic Bloom filter (BF) can merge to another by performing bitwise OR on the bit arrays of these two Bloom filters (see Figure 3). The information of the inserted items is represented with the ''1'' bits in the bit array of a BF. After the merging operation is carried out on multiple BFs, the ''1'' bits of each BF are aggregated into the result without any interference from the other aggregated BFs. Inspired by this, we try to unfold the counter array of CBF to a group of bit arrays, and schedule these bit arrays in a fixed sequence to record the inserted items. Specifically, when a hashing result of an item is obtained, we select an array, and then set the cell corresponding to the hash result to ''1''. We need to repeat this procedure k times by using k hash functions to reduce the false positive ratio of membership query operations (see the configuration of BF [9]). After k cells are set according to k hash results, all ''1'' bits in different bit arrays of our filter are used as the record of this inserted item. The used bit array should be different in terms of distinct positions where the hash values point to, guaranteeing the record of the inserted items free to collision with a high probability.
To achieve that, we use the bit arrays by following the idea of Cuckoo hash [20]. More specifically, we propose a virtual-Cuckoo schedule algorithm for scheduling bit array usage in the group. Meanwhile we include an additional bit array to record the bitwise OR result of the bit array group of the proposed filter. Ultimately, we can speed up the query operation on our filter at a cost of little longer inserting, deleting and merging latency.
The structure of the proposed mergeable counting bloom filter (mergeCBF) is illustrated in Figure 4: 1) Hash Function Set (i.e., Hash 1 to Hash k ): A set of hash functions used to map each inserted item into certain array cells at different positions. The number of the hash functions (i.e., k) is set up according to the optimized configuration of BF [9], where k is set up according to the size of the bit array (i.e., m) and the expected scale of inserted data items (i.e., n), k = ln2 · n m . 2) Bit Array Group of Size g (i.e., barr 1 to barr g ): A group of bit arrays is involved in to replace the counter array used in CBF. To record each hash result of an inserted item, we set one of the cells at the position where the hash result indicate to be ''1''. In this way, we update the counting result corresponding to the indicated position. Therefore, g is set up according to the maximum value of the counting result, which can be confirmed according to the analysis in [12]. The cell numbers of barr 1 , barr 2 , . . ., and barr g are all equal to m.

3) Bitwise OR Result of the Array Group (i.e., orBarr):
A bitwise OR result on the bit array group, used to improve the query efficicy of the proposed mergeable bloom filter. That is quite important for edge computing applications to look up specific data among edge nodes and make decisions according to the distribution of these data. Ultimately, the proposed mergeCBF dose not only support the inserting, deleting and query of data items, but also 55332 VOLUME 9, 2021 supports the merging operation on multiple filters. These operations will be described one by one.

B. INSERTING
To insert a new item into mergeCBF, we need not only generate hash values of this item by all k hash functions, but also schedule bit arrays to be used for recording these hash results.
To guarantee the record of the inserted items free to collision with a high probability, we introduce a novel schedule algorithm, namely virtual-Cuckoo schedule algorithm, to use different bit arrays in a random sequence.
Cuckoo hash [20] points out a cell of the hash table to record an new inserted item in a way that it selects a cell for this item, kicks out the existing item in that cell, insert the new item and then finds an alternate cell for the victim item, repeats the above procedure until no collision occurs. Cuckoo hash ensures high space occupancy because it refines the item inserting operation towards more cells used earlier. Targeting more efficient usage of bit arrays of our filter, we extend the scheduling principle of Cuckoo hash to a virtual version for scheduling bit array usage efficiently. Regarding our filter, we record an item with ''1'' bits in k cells. If a hash result indicates an occupied cell, we actually have no need to push the existing ''1'' out the cell and re-insert it to an alternative cell. Considering all of the inserted in these cells are ''1'' bits, we can directly skip the occupied cell and find the next cell according to the cuckoo hashing result on the index of the occupied cell. Consequently, the used cells are the same as the used by the conventional Cuckoo hash. The detail scheduling procedure is listed in Algorithm 1.
The above recursion can converge to guarantee all inserting operation succeed, according to the Cuckoo graph theory Algorithm 1 Virtual-Cuckoo Input: i; //The initial index of a cell to store a new item; {c 1 ,. . . ,c g }; //The group of g candidate cells; Output:î; //The index of the next available cell among g candidate cells; of Cuckoo hash. Cuckoo graph is an undirected graph that consists of vertexes for the indices of different hash table cells, and edges with two endpoints indicating two candidate cells for an inserting procedure. When Cuckoo hash insert items into the hash table, all inserting operations can succeed if and only if the corresponding cuckoo graph is a graph with at most one cycle in each of its connected components [20]. Cuckoo hash used in Algorithm 1 can generate random hash results only with a trivial collision probability, that means different inputs are mapped to distinct results. Therefore, each vertex in the Cuckoo graph of Algorithm 1 connect another one by one, and the last vertex will connect to the first vertex finally. The Cuckoo graph of Algorithm 1 is circle. All insert operations of Algorithm 1 will succeed, and Algorithm 1 finally comes to convergence.
As depicted in Figure 4, we apply Algorithm 1 to schedule bit array usage during inserting and deleting items in our filter, when the hash result of an item indicates a specific position of the bit arrays. The inserting procedures are listed as the follows. . Notice that inserting operation should be performed after a query operation, in order to avoid inserting a data item repeatedly. The detailed inserting process is described in Algorithm 2.

C. DELETING
The deleting operation is similar to the inserting operation. An function like virtual-Cuckoo, virtual-Cuckoo-, is applied to return the last used candidate cell. To be simplicity, VOLUME 9, 2021 we don't provide the detail of the deleting operation here, but the key steps as the follows: 1) Query the item.
2) If the item is exist, we find k cells by the function virtual-Cuckoo-, and set these cells to be ''0''. 3) Finally, update orBarr.

D. QUERY
We involve bit array orBarr in our filter to achieve more efficient membership query. The query operation is implemented on orBarr through checking whether the k cells indicated by Hash 1 , Hash 2 , . . . , Hash k are equal to ''1''. The detailed procedure is listed in Algorithm 3.

E. MERGING
As discussed in Section II-B, we need to make our mergeCBF feasible to merge with each other to support in-network data information summarizing and exchanging. Compared to the only one CBF that support the merging operation on multiple filters, QF and its variants, our design is more efficient to support the merging procedure by performing bitwise OR on bit arrays (see Figure 5). To avoid the merging result exceeds end if 6: end for 7: return true the optimized capacity of mergeCBF (i.e., n), we need to estimate the total number of the items inserted in the filters that will merge with each other. According the balls and bins theory [22], a bit cell is set to be ''1'' with a probability under a Poisson distribution P(n). Hence, suppose η cells contains ''1'' bits, x is the times that a cell has been set to ''1'', the estimation value of the number of items inserted in a filter,n, is derived as the following: To select η cells from m cells, there are m η combinations.
The merging procedure is listed as the following steps. 1) Evaluate whether the total number of items inserted in these filters exceeded the capacity of mergeCBF. If so, stop the merging operation and report an error, otherwise, continue the next step. 2) For ∀j, 1 ≤ j ≤ g, mergeCBF performs bitwise OR on the j − th bit arrays of different filters to obtain a merging result of the j − th bit arrays. 3) Perform bitwise on all orBarr of different filters to obtain a merging result of orBarr. The detailed is present in Algorithm 4.

IV. ANALYSIS
In this section, we provide some theoretical analysis about mergeCBF.
Theorem 1: The probability that the number of cells at the same position of bit arrays are set to be ''1'' is greater than or equal to g has an upper bound m 2πnk 4π 2 (nkg−g 2 ) nk nk (nk−g) nk−g g g ( 1 m ) g . Proof: Suppose the maximum capacity of mergeCBF can insert is n, the size of bit arrays is m, the number of hash functions is k, and the number of bit array is g. The probability that the number of cells at the same position of bit arrays have been set to be ''1'' is greater than or equal to g is: According to Stirling formulation, the following can be obtained: According to m = nk ln2 discussed in [9], it can be found that in the optimal configuration of bloom filter, n × k has a linear relationship with m. Therefore, n × k is much larger than g, and the value in the square root of Formula 5 is less than 1. Meanwhile, the second term is enlarged to obtain a relaxed upper bound of Formula 4: which is an upper bound of the probability that the number of cells at the same position of bit arrays are set to be ''1'' is greater than or equal to g.
According to formula (6), when g is 16, the probability of all CCBF bits being set to ''1'' is 1.37×10 −15 ×m. In general, this case will come up in an edge computing application.

Theorem 2: The inserting sequence of items has no impact on their compact representation result in the bit arrays of mergeCBF.
Proof: without loss of generality, among a group of data items, item 1 , . . . , item i , . . . , item n , inverted into mergeCBF in a specific sequence, two arbitrary items, item x and item y , are map to the same position of mergeCBF (i.e., the j − th cells). Suppose α cells at the position j has been set to be ''1''. If we insert item x in mergeCBF before item y , we use virtual-Cuckoo to find an array, array a , and set the j − th cell of array a to be ''1''. Similarly, when we insert item y into mergeCBF, we use virtual-Cuckoo to find the next available array, array b , and set the j − th cell of array a to be ''1''. Alternatively, we first insert item y in mergeCBF and then perform the inserting operation on item x . The j − th cell of array a and that of array b are used in the same sequence, and they are both set to be ''1'' finally. Therefore, The inserting sequence of different items has no impact on their compact representation result in the bit arrays of mergeCBF.
Since the inserting sequence of items has no impact on their compact representation results in the bit arrays of mergeCBF, the ''1'' bits representing the same items in different mergeCBFs can be switched to the cells at the same positions. When these mergeCBFs merge with each other, we eliminate the replicated ''1'' bits in the cells at the same positions by bitwise OR. This make our mergeCBF can support the merging operation effectively.

V. EXPERIMENTAL EVALUATION
In this section, we conduct experimental evaluations to validate the effectiveness and efficiency of the proposed mergeable Counting Bloom filter (mergeCBF). We first explore the false positive evaluations at mergeCBF and three baseline schemes for comparison. We then demonstrate the highperformance of the proposed mergeCBF under different considerations.

A. IMPLEMENTATION AND EXPERIMENT SETUP
All the experiments were run on a Intel(R) Core(TM) i5-5200u CPU @2.20GHz, dual-core and four-threaded with 8 GB of RAM. To evaluate the performance of the proposed mergeCBF, we implement mergeCBF and three of the most popular counting filters, CB, dlCBF and Counting Quotient Filter(CQF). The hash function used in the filters is Mur-murHash3 with a hash result of 32 bits, which is an efficient hash function with a low collision rate. Other parameters are listed in Table 2. The detailed implementation of these filters is list as the follows: 1) mergeCBF: We use 16 bit arrays in mergeCBF to replace the counter of 4 bits used in CBF. According to the preset false positive ratio (0.05) and the relations among the number of hash functions(k), the size of the bit arrays (m), and the estimated capicity(n s )), the k and m are set to be 3 and m = (capacity × |log(errorrate)|)/(k(log2) 2 ) respectively. A simple one_at_a_time function [21] with a small hashing range g is used as the Cuckoo hash function. 2) CBF [12]: To make the counting range of each counter in CBF also be 16, the size of each counter is set to be 4 bits. The preset false positive ratio, the number of hash functions(k) and the size of the counter arrays (m) are set in accordance to those configured in mergeCBF. 3) dlCBF [14]: dlCBF divides the storage to d partitions, and correspondingly, 4 hash functions are used to map items into the buckets in these partitions. To make the estimated counting range be 6, the number of the buckets in each partition is set up as capacity/(4 × 6), and the size of each bucket is set to be 8 to avoid the inserted items turning out the preset counting range. 4) CQF [19]: CQF use a hash function to generate hash values for different data items. The first q bits of a hash value are used to indicate the original position of the hashed item (i.e., Quotient value). Ultimately, the capacity of CQF is equal to 2 q . The last 6 bits are stored in cells as the remainder of this item to eliminate the query false positive. Two bits of each cell is used as Occupieds and Runends, respectively. A popular real-world open dataset, gas sensor array temperature modulation dataset, is used to evaluate the performance of mergeCBF. The data items in this dataset are collected from a chemical detection platform composed of 14 temperature-modulated metal oxide (MOX) gas sensors to monitor the mixtures of carbon monoxide and humid synthetic air in a gas chamber during 3 weeks. This dataset provides 4,095,000 data items of the sensors and the measured values of CO concentration, humidity and temperature inside the gas chamber. We sample from this dataset to build a serial of experimental datasets with different scales (10 4 , 10 5 and 10 6 ).

B. FALSE POSITIVE EVALUATION
Just like other bloom filters, MergeCBF achieves compact data representation at a cost of few false positives in the query process, that means the queries on the items that have not been inserted in mergeCBF will return positive results. Considering the other type of errors, false negatives, will not come up during membership queries on bloom filter and its variants, we only evaluate the false positives for mergeCBF to query data items, as well as those of the other three baselines. As depicted in Figure 6, we configure the filters for different capacity scales, 10 4 , 10 5 and 10 6 , and then insert 10 4 , 10 5 and 10 6 items into these filters, respectively. Correspondingly, we query 10 4 , 10 5 , and 10 6 nonexistent items on the these filters. The false positive evaluation results have linear increase in the number while we query data items on larger scales on different filters. Among the query results corresponding to the query scale of 10 6 , 886 false positives come up when we perform query operations on mergeCBF, those for CBF, dlCBF and CQF are 5037, 1379 and 691. The proposed mergeCBF can accurately represent a huge number of inserted data in a compact way.
To achieve a comprehensive evaluation about false positives, we present the trend of the false positive increment while we insert more data items in a mergeCBF with a configured capacity 10 4 , as well as in two others with the configured capacity 10 5 and 10 6 (see Figure 7). R fill is a ratio to indicate how much data has been inserted into the filters. R fill = N s /n, where N s is the number of the items that have been inserted in this mergeCBF, and n is the configured capacity of this mergeCBF. When R fill is less than 0.85, the number of the false positives remains 0. While R fill continues to increase from 0.85, an exponential increase comes up, until R fill = 1. Similar to other bloom filters, mergeCBF can provide the perfect data compact representation in the cases before the inserted data items have not filled the filter up. We need to enlarge the capacity of the filter, and make the configuration of the filter capacity more robust for all cases.

C. EFFICIENCY EVALUATION
To evaluate the efficiency of mergeCBF, We study the latency for inserting, deleting and query items on mergeCBF and  the three other baselines. Since among all of the existing approaches, only QF and its variant, CQF, can support merging operations, we compare the latency for merging multiple mergeCBFs with that for CQF.

1) QUERY LATENCY
As shown in Figure 8, the proposed mergeCBF always takes the least time to complete all query tasks on whatever data sets we use. To query 10 4 data items, all filters achieve similar performance. For a larger data set with a scale of 10 5 , mergeCBF takes 0.09 Second less than that of CQF, 0.16 Second less than that of dlCBF, and 0.44 Second less than that of CBF. mergeCBF outperforms the three baselines much more when we use the largest data set. The involved orBarr improves the query efficiency of mergeCBF, and thus mergeCBF can achieve the minimum in-memory query latency like basic Bloom filter. CQF leverages both memory storage resource and SSD storage resource to store more data information into a hash table, and achieve similar query performance.

2) INSERTING AND DELETING LATENCY
As depicted in Figure 9 and Figure 10, mergeCBF takes at most 5 times of time latency to complete inserting operations and deleting operations compared with those of CBF, dlCBF and CQF. For instance, mergeCBF takes 1.42 second,  CBF takes 0.76 second, dlCBF takes 0.45 second, CQF takes 0.37 second to complete 10 5 inserting operations. For each inserting operation, mergeCBF will use more 10.5 µs. Similarly, mergeCBF takes 1.63s, CBF takes 0.3s, QF takes 0.44s and dlCBF takes 0.43s to complete 10 5 inserting operations. For each deleting operation, mergeCBF will use more 10.6 µs.
We include g bit arrays into mergeCBF to replace the counter array used in CBF. Consequently, although we extend CBF to support merging operations, the updating process on these bit arrays causes longer latency. The inserting and deleting operations are called only when there is a new data or an existing data expires. For each data item, these two operations only be called once. Controversially, the query and merging operation are call frequently by edge nodes to achieve efficient in-network data information collecting and analysis. Considering the above facts, we include an additional bit array in mergeCBF to aggregate all the g bit arrays and speed up query operations at a cost of involving more computational overhead on the inserting and deleting operations.

3) MERGING LATENCY
The merging operation is the significantly featured function of mergeCBF, which is one of the indispensable functions for an in-network compact representation. Figure 11 depicts the time cost for CQF and mergeCBF to merge multiple filters, where the x-axis represents the number of filters. One group of mergeCBFs and CQFs inserted 10 5 data items. Another group of filters have inserted 50000 data items sampled with replacement. We can find for whatever data items are inserted the filters and how many of the filters we perform the merging operations on, mergeCBF achieve better merging performance compared with CQF. The minimum difference between the merging latency of mergeCBF and that of QCF is 2.91 second.

4) IMPACT OF COUNTER SIZE
As depicted in Section III-A, mergeCBF uses g bit arrays to replace the counter array used in the conventional CBF, where g is set in accordance to the estimated maximum counting range of the counters used in CBF. Although this maximum counting range is usually set to be 16, a larger counter can enhance the scalability of CBF and its variants, as well as a smaller one will reduce the storage overhead. Correspondingly, the filters may have a change of the latency regarding different operations. To evaluate the impact of counting range on the performance of mergeCBF, we set the counting range g to be 8, 16, and 32, and the capacity to be 10 5 . We query, insert and delete 10 5 data items on different filters. As shown in Figure 12, different counting ranges have no impact on the latency of query operations. and only have little impact on the latency of inserting and deleting operations. Although the latency for the virtual-Cuckoo scheduling on g bit arrays is quite small, the scheduling on more bit arrays will cost a little bit more time.

VI. RELATED WORK
Internet has massively increased the amount of data available for novel intelligent applications. To achieve efficient represent for massive data in Internet, Bloom filter (BF) [9] has been applied as the most widely used as a compact data representation mechanism, and many different types of Bloom filters have been derived to meet diverse requirements of novel data-intensive applications. Among them, Counting Bloom filter (CBF) [12] is designed to support both inserting operation and deleting operation. CBF expands the single-bit cell of BF into a fixed-length multi-bit cell for counting, and increment each cell hits by hashing operations on inputted items. The hit times of the cell is recorded in the counter to support deleting the corresponding items, by reducing the counter value.
To optimize the performance of CBF by reducing the storage overhead and the false positive ratio, a variety of new counting Bloom filters have been proposed. Among them, the most popular approaches are present as the following. Cohen et al. proposed Spectral Bloom Filter (SBF) [13], it can ensure that the counter of any cell will never exceed its maximum counting range by extending the counter array of the CBF, but its storage overhead is larger compared with CBF. dlCBF [14] is proposed based on d-left hash, and the storage cost is optimized under the same false positive rate. MLCCBF data structure [15] improves standard CBF in terms of fast accessing and limited memory consumption, but can only be implemented in the small and fast local memory or ''on-chip SRAM''. Rottenstreich et al. A general method [16] based on variable increments, called Variable Increment Counting Bloom Filter (VI-CBF), can always achieve a lower false positive rate and a lower overflow probability bound than CBF in practical systems. VI-CBF is a popular data structure for the representation of dynamic sets, achieving a good trade off between memory efficiency and query accuracy. Tandem Counting Bloom Filter (T-CBF) [17] is a new data structure that relies on the interaction among counters to describe sets with higher accuracy compared with VI-CBF. Quotient Filter [18] is an another type of data representation structure that achieves efficient insert, delete and membership query on a hash table on SSD. As a variant of Quotient Filter, the counting quotient filter (CQF) [19], also base on memory and SSD, is proposed to support approximate membership testing and data item counting, meanwhile supporting to the merge operation on multiple filters.

VII. CONCLUSION
In this paper, we propose a novel Mergeable Counting bloom filter (mergeCBF). mergeCBF can efficiently insert, delete and query data items, and its computational complexity is at the same scale of the conventional counting bloom filter. The performance of the query operation, the most frequent operation during edge computing, is enhanced compared with that of other counting filters. In addition, mergeCBF supports efficient multi-filter merging, which lays the foundation for a variety of scenarios that require an in-network compact representation mechanism. It would be extensively studied in further work to adapt more specific in-network representation scenarios, e.g., distributed deep learning for augmented reality and intelligent transportation.