Zuckerli: A New Compressed Representation for Graphs

Zuckerli is a scalable compression system meant for large real-world graphs. Graphs are notoriously challenging structures to store efficiently due to their linked nature, which makes it hard to separate them into smaller, compact components. Therefore, effective compression is crucial when dealing with large graphs, which can have billions of nodes and edges. Furthermore, a good compression system should give the user fast and reasonably flexible access to parts of the compressed data without requiring full decompression, which may be unfeasible on their system. Zuckerli improves multiple aspects of WebGraph, the current state-of-the-art in compressing real-world graphs, by using advanced compression techniques and novel heuristic graph algorithms. It can produce both a compressed representation for storage and one which allows fast direct access to the adjacency lists of the compressed graph without decompressing the entire graph. We validate the effectiveness of Zuckerli on real-world graphs with up to a billion nodes and 90 billion edges, conducting an extensive experimental evaluation of both compression density and decompression performance. We show that Zuckerli-compressed graphs are 10% to 29% smaller, and more than 20% in most cases, with a resource usage for decompression comparable to that of WebGraph.


Introduction
Graph compression essentially boils down to compressing the adjacency lists of a graph G = (V, E), where its nodes are suitably numbered from 1 to n = |V |, and the adjacency list storing the neighbors of each node is seen as the sorted sequence of the corresponding integers from [1, 2, . . . , n]. It is straightforward to use a 64-bit word of memory for each integer (i.e. edge), plus O(n) words for the degrees and the pointers to the n adjacency lists, thus requiring O(n + m) words of memory for the standard representation of G.
The challenge is to use very few bits per edge and node, so as to squeeze G into as little space as possible. This can make a dramatic difference for massive graphs, particularly if the compressed graph fits into main memory, while its standard representation does not. The over 450 bibliographic entries in a recent survey on lossless graph compression [5] give a measure of the increasing interest for this line of research. Among the numerous proposals, the WebGraph framework [10,11] is widely recognized as the touchstone for its outstanding compression ratio.
In this paper, we consider the lossless graph compression scenario, showing how to compress G and supporting two kinds of operations on the resulting compressed representation of G: Full decompression: Decompress the representation entirely, obtaining the standard representation of G.
List decompression: For any given node u ∈ [n], decompress incrementally the adjacency list of u, while keeping the rest compressed.
List decompression can allow us to run some graph algorithms directly on the compressed representation on the graph: several fundamental algorithms, such as a graph traversal, are based on partially scanning adjacency lists that are decompressed during the scan.
On the other hand, we do not want to support decompressing a single edge (i.e. directly checking adjacency between two nodes) for two reasons: it degrades the performance of scanning an adjacency list, and many of the well-known graph algorithms hardly require to access few random items of an adjacency list without accessing the list from the beginning. Moreover, scanning a list is so fast in our implementation that any attempt to jump parts of it would just degrade the performance due to the extra machinery required.
In this paper, we present a new graph compressor called Zuckerli. By incorporating advanced compression techniques and novel heuristic algorithms, Zuckerli is able to replace Webgraphcompressed graphs with a compressed structure representing the same data, but that uses 20% to 30% less space for web graphs, and 10% to 15% less space for social networks, saving significant space on storage media. These savings also hold when compressing a graph for list decompression, compared to the corresponding list decompression mode of WebGraph. Decompression is highly tuned and very fast, providing millions of edges per second on a commodity computer.
To the best of our knowledge, Zuckerli significantly improves the state-of-the-art in graph compression when full or list decompression is supported.
Related work. Compressing graphs is a well-studied problem. The WebGraph framework [10,11] exploits two well known properties shared by web graphs (and, in a smaller measure, by social networks), locality and similarity, originally exploited by the LINKS database [20]. WebGraph is the graph compression technique most directly related to Zuckerli, as it uses the above properties.
More recently, an approach called Log(Graph) and based on graph logarithmization [6] has been explored. The analysis conducted shows that, while Log(Graph) achieves better performance while performing various operations, the WebGraph framework is still the most competitive approach in terms of compression ratio, especially for web graphs.
Another well-known approach to graph compression are k 2 -trees [13], which use a succinct representation of a bidimensional k-tree on the adjacency matrix of the graph. Unlike WebGraph, this scheme allows for accessing single edges, without requiring the decoding of full adjacency lists at a time. As a consequence, it achieves somewhat worse compression ratios, but is more suited for applications where single edges are queried. The k 2 -trees have been subsequently improved by 2D block trees [12], a LZ77-like approach that can compress bidimensional data. As with k 2 -trees, it allows for querying single edges; however, it achieves significantly improved compression ratios, at the cost of a hit in query time. A brief experimental comparison between Zuckerli, k 2 -trees and 2D block trees can be found in Section 4. Some other approaches follow a different philosophy, that is, providing access to the compressed graph with a wide range of complex operations, or even a query language, at the cost of sub-optimal compression ratios. This is the case for example of ZipG [18], a distributed graph storage system aims at compactly storing a graph, including semantic information on its nodes and edges, while allowing access to this information via a minimal but rich API. We refer the reader to the survey in [5] for a panoramic view of the research on graph compression.
The paper is organized as follows. Section 2 discusses some methods to encode integers, which are at the heart of our compression algorithms and are used to encode all the data that results from the higher-level compression scheme. Section 3 describes the Zuckerli high-level encoding scheme, which, in brief, consists in block-copying, that is re-using parts of the adjacency lists of previous nodes to encode the adjacency list of current nodes, delta-coding of values that are not copied and context-modeling of all the values to improve compression. This section also describes heuristics to improve the encoding choices made by the encoder. We then report the experimental study in Section 4, and draw conclusions in Section 5.

Encoding Integers
Our graph compression method modifies the adjacency lists, which are sequences of integers, to produce other sequences of integers that can be encoded more succinctly. Thus, encoding methods for the integers are at the heart of Zuckerli, and we discuss the ones that we employ from existing literature, or that we design for this purpose.

Multi-context entropy encoding
Zuckerli uses Huffman coding [17] when list decompression is supported, and Asymmetric Numeral Systems (ANS) [16] when full decompression is required only.
Conceptually, ANS encodes a sequence of input symbols in a single number that can be represented with a number of bits that is close to the entropy of the data stream. Thus, it is a form of arithmetic coding (whose idea goes back to Shannon [22]), but compared to traditional methods of arithmetic coding it can achieve faster compression and decompression speeds. The encoding process adds a symbol s to the stream represented by x by producing a new integer C(s, x) = M x/F s + B s + (x mod F s ), where M is the sum of the frequencies of all the symbols, F s is the frequency of the symbol s and B s is the cumulative frequency of all the symbols before s. This function is invertible, hence the decoder can reverse this process and produce the stream of symbols starting from x.
Like all variants of arithmetic coding, practical implementations of ANS do not use arbitrary precision arithmetic, but rather they keep an internal state in a fixed range [S, 2 b S) that is manipulated for each symbol in the stream: when the state overflows, it yields b bits during encoding; when the state underflows, it consumes b bits when decoding. For correct decoding, it is required that S is a multiple of M . In our case, we set S = 2 16 , M = 2 12 , and b = 16. Since the decoding procedure is just the reverse of the decoding procedure, ANS makes it easy to interleave non-compressed bits.
The variant of ANS used by Zuckerli is inspired by the one employed in the new standard JPEG XL [2] for lossy image compression.
When list decompression is supported, one disadvantage of ANS (as well of as other encoding schemes that can use a non-integer number of bits per encoded symbol) is that it requires keeping track of its internal state. For decoding to successfully be able to resume from a given position in the stream, it is also necessary to be able to recover the state of the entropy coder at that point of the stream, which would cause significant per-node overhead if using ANS. Thus, in this case, Zuckerli switches to using Huffman coding.
Huffman coding represents every input symbol with a variable number of bits, without having an internal state. The bits of the representation are chosen in such a way that no two symbols share the same prefix of bits (to allow to decode correctly). As a consequence, Huffman coding easily allows seeking, but cannot use less than one bit per symbol. Both Huffman and ANS use a context or model, which is a prediction of the probability distribution for the symbols in the stream that are obtained from the adjacency lists. The more accurate the prediction is, the closer to optimal the compression gain will be. As both the encoder and the decoder must share the same context, Zuckerli has to store the probability distributions corresponding to a context when encoding the graph. Symbols to be encoded are spread among multiple contexts, allowing more precise encoding when symbols are assumed to belong to different probability distributions. Hence, multi-context entropy coding is one significant source of improvements of Zuckerli in comparison to other approaches.

Hybrid integer encoding
When compressing streams, both Huffman and ANS encode the symbols belonging to a given alphabet and thus benefit from having a reduced alphabet size. However, alphabet may grow too large in our case as Zuckerli needs to encode integers of arbitrary length and cannot use a distinct symbol for each integer. Zuckerli thus introduces a new hybrid integer encoding scheme, described below. This generalizes a scheme that was initially developed for image compression in JPEG XL [2] 1 .
Zuckerli's hybrid encoding scheme is defined by three parameters: i, j and k, with k ≥ i + j and i, j ≥ 0. Every integer in the range [0, 2 k ) is encoded directly as symbol in the alphabet.
Any other integer x ≥ 2 k is encoded as follows. First, consider the binary representation of x: b p b p−1 · · · b 1 , where b p = 1 is the highest non-zero bit. Equivalently, identify x with its corresponding triple (m, t, l) where m is the integer formed by the i bits b p−1 · · · b p−i following b p , l is the integer formed by the rightmost j bits b j · · · b 1 , and t is the integer encoded by the bits between those of m and l, as illustrated below: Clearly, given the triple (m, t, l), we can reconstruct x. We conveniently encode that triple by a pair (s, t) where s = 2 k + (p − k − 1) · 2 i+j + m · 2 j + l encodes, respectively, the value of k by 2 k , the value of p ≥ k + 1 by (p − k − 1) · 2 i+j , the value of m as m · 2 j followed by l.
For example, for k = 4, i = 1, and j = 2, the integer x = 105 has binary representation 1 1 0100 11 and its corresponding triple is (1,4,3), and thus encoded as the pair (16 + 3 · 8 + 1 · 4 + 3, 4) = (47, 4) where p = 8. As another example, when k = 4, i = 1 and j = 1, the integers from 0 to 15 are encoded with their corresponding symbol s in the alphabet, and t is empty; 23 has binary representation 10111 and thus is encoded as symbol 17 (the highest set bit is in position 5, the following bit is 0, and the last bit is 1), followed by the two remaining bits 11; 33 is encoded as symbol 21 (highest set bit is in position 6, following bit is 0 and last bit is 1) followed by the three remaining bits 000.
The advantage of this scheme is that s has a smaller range than x, and can thus be entropyencoded by either Huffman or ANS: using this representation, r-bit integers require at most 2 k + (r − k − 1) · 2 i+j symbols in the alphabet instead of 2 r .
As for t, it is stored as-is in the encoded file, just after entropy coding s. Note that it is possible to compute the number of bits of t from s, without knowing x: this allows the decoder to know how many bits to read. The procedure to decode an integer from the (s, t) pair consists of recovering the corresponding triple (m, t, l) and then reconstructing x. The procedure is detailed in Algorithm 1.

Negative integers
We encode a negative integer s as follows, as it is easy to reverse this bijection between integers and natural numbers [10].

Graph compression in Zuckerli
This section details the graph compression scheme used by Zuckerli.

Brief summary of WebGraph
As Zuckerli reuses and improves on multiple aspects of WebGraph, here we provide a brief summary of the WebGraph scheme. Let W and L be global parameters representing the "window size", which is limited to speed up compression time, and the "minimum interval length". For each node u ∈ V , WebGraph encodes its degree deg(u) and, if deg(u) > 0, the following information for the adjacency list of u: 1. A reference number r, which can be either a number in [1, W ), meaning that the list is represented by referencing the adjacency list of node u−r (called reference list), or 0, meaning that the list is represented without referencing any other list.
2. If r > 0, it is followed by a list of integers indicating the indices where the reference list should be split to obtain contiguous blocks. Blocks in even positions represent edges that should be copied to the current list. The format contains, in this order, the number of blocks, the length of the first block, and the length minus 1 of all the following blocks (since no block except the first may be empty). The last block is never stored, as its length can be deduced from the length of the reference list.
3. A list of intervals follows; each interval has at least L consecutive nodes that are not copied from the blocks in point 2.
4. Whatever nodes are left from points 2-3 are called residuals, and they are delta-coded. Their number can be deduced by the degree, the number of copied edges and the number of edges represented by intervals. The first residual is encoded by difference with respect to u (and thus it can be a negative number), and each of the remaining residuals is represented by difference with respect to the previous residual, minus 1.
WebGraph represents the resulting sequence of non-negative integers by using ζ codes [11], a set of universal codes particularly suited to represent integers following a power-law distribution.
Moreover, to guarantee fast access to individual adjacency lists, WebGraph limits the length of the reference chain of each node. In particular, a reference chain is a sequence of nodes u 1 , . . . , u such that node n i+1 uses node n i as a reference r. Every chain has length ≤ R, where R is a global parameter.

Zuckerli scheme
In this section, we summarize the novel aspects introduced by Zuckerli in relation to WebGraph.
First, Zuckerli entropy-encodes the integers, as described in Section 2. This is in contrast with WebGraph's ζ coding [11].
Secondly, Zuckerli splits the nodes of G into chunks of size C, where the first chunk contains the first C nodes in G, the second chunk contains the following C nodes in G, and so on. When list decompression is not required, we set C = ∞. Inside each chunk, degrees of the nodes are stored. Notably, the representations of node degrees requires a significant amount of bits. To improve compression, Zuckerli represents it via delta encoding, i.e. as the difference between the current degree and the previous one. As this procedure may produce negative numbers, deltas are represented using the transformation described in Equation 1. Delta encoding across multiple adjacency lists is of course hostile to allowing access to any adjacency lists without decoding the rest of the graph first. For this reason, Zuckerli adopts chunks.
Thirdly, while Zuckerli uses reference lists and blocks in the same way as WebGraph (points 1 and 2), the choice of the reference list and reference chain is more sophisticated. We defer its description to Section 3.4.
Fourthly, Zuckerli does not use intervals, in contrast with WebGraph (point 3). As a form of simplification, the special representation for intervals is replaced with run-length encoding [21] of zero gaps. When reading residuals, as soon as a sequence of exactly L zero gaps is read, for a global parameter L , another integer is read to represent the subsequent number of zero gaps, which are not otherwise represented in the compressed representation. Since ANS does not require an integer number of bits per symbol, and allows for very efficient representations of sequences of zeros, we set L = ∞ if list decompression is not supported.
Finally, Zuckerli modifies the representation of the residuals, which are stored via delta encoding. The representation chosen by WebGraph (point 4) does not exploit the fact that an edge might already be represented by block copies (or intervals). For example, consider the case in which We are encoding the adjacency list of node 7 using the adjacency list of node 6 as a reference. Highlighted in blue are the edges that the two nodes have in common, i.e. the blocks to be copied from the reference node adjacency list. The block encoding is performed as described in Section 3.1 (point 2). Highlighted in red are the residual values, which are stored as follows: the first residual is encoded as the delta between the current node and itself, while the next values are encoded as d − 1, where d is the value to add to the previous residual, implicitly skipping any possible edges that have already been added though blocks. The boxes in the final list representation show, in order, the data that gets encoded: the delta of the degree of the current node with respect to the previous node, the delta (in absolute value) of the reference node with respect to the current node, the number of blocks, the block encoding, the residual deltas.

Context management
As mentioned in Section 2, Zuckerli uses Huffman coding and ANS with multiple contexts, i.e. distinct probability distributions. To the best of our knowledge, while this is a well-known encoding technique, its application to graph compression is new. Here we detail how symbols are split among the different contexts. Inside each chunk, the symbol that represents the delta-coded degree with respect to the previous node is used to choose the distribution for the current node. Similarly, inside a chunk, the reference number used for the last list is used to choose a distribution for the current one.
When compressing blocks, a separate distribution is used for the first block, all the even blocks, and all the odd blocks. This is because the first block is the only one for which its length does not get reduced by 1, and we expect the number of edges to be copied (odd blocks) to have a different distribution from the number of edges to be skipped (even blocks), depending on the graph.
For delta-encoding the first residual with respect to the current node, the symbol that would be used to represent the number of residuals defines which distribution to use. This is because a list with a high number of residuals will likely be harder to predict.
Finally, for all other residual deltas, the symbol that was used to encode the previous one is used to choose the corresponding probability distribution for the current delta.
We remark that each probability distribution used by Zuckerli is stored in the compressed file, and is not changed as edges are decoded.

Choice of reference list and chain
We explain how Zuckerli selects reference lists to be used during compression. As previously discussed, we may either represent a node's list explicitly or, if we use a reference, we represent the difference from the list of its reference.
To make an effective choice, we need to estimate the amount of bits that the algorithm will use to compress an adjacency list using a given reference. Since we use an adaptive entropy model, this is not a simple task, as choices for one list might affect probabilities for all other ones.
We choose to use an iterative approach previously used by Zopfli [3], a general compression algorithm. We initialize symbol probabilities with a simple fixed model (all symbols have equal probability), and then choose reference lists assuming these will be the final costs. We then update the symbol probabilities given by the chosen reference lists and repeat the procedure with the new probability distribution. This process is then repeated a constant number of times.
We now consider the two types of compression separately: Full decompression. In this case, there is no limitation on the length of the reference chain used by a single node, i.e., a reference node may itself have a reference node, and so on; we obtain an optimal solution with the greedy strategy, choosing the reference node that gives the best compression out of all the ones available in the window of the current node, i.e., the W preceding nodes.
List decompression. To decompress a single list, we must also decompress its reference chain: when access to single lists is requested, more care is required to select good references while avoiding reference chains longer than a given threshold R. We may want to represent 2's list using 1's as a reference: this way we do not need to represent 3, 4, and 7, but just the node 9 in the difference; similarly, if we represent 3's list using 2's as reference, we just need to omit node 3. However, in order to decompress 3's list we will need to read (hence decompress) the list of its reference 2, which in turn requires decompressing 1's list. The longer the chain, the longer the decompression time: the parameter R allows us to keep this overhead under control.
We can formally state the problem of choosing the references as follows. We are given a directed acyclic graph D, where the nodes represents the adjacency lists. There is an arc between two nodes if one adjacency list can refer to the other. The weight of the arc corresponds to the number of bits saved by choosing that reference. The larger the weights, the better the compression gain. Thus, we aim at finding a maximum-weight directed forest O for D, where each node has out-degree at most one (its reference), and there are no directed paths longer than R (i.e. a reference chain longer than R). Finding an optimal solution seems not trivial, and it is unclear whether it can be done in polynomial-time. 3 Zuckerli uses an efficient heuristic with approximation guarantees. Given D, it first builds the optimal directed forest F , ignoring the constraint that directed paths cannot be longer than R (this corresponds to the solution of the full decompression case).
Instead of solving our problem on D as we formulated above, Zuckerli computes an optimal subforest H on F , as the latter be found by the following dynamic programming algorithm, answering the question "what is the sub-forest H of maximum weight that is contained in the current subforest of F and does not have paths of length R + 1?".
Clearly, H is not necessarily the optimal solution for D, as it is computed for its subgraph F . However, there may still be arcs of D that were not in F , but can now be added to H without creating long chains. Zuckerli tries to extend H with such arcs in a greedy way, obtaining the final heuristic solution.
Approximation guarantee. Interestingly, our heuristics not only works quite well in practice, but it also provides a guaranteed (1 − 1 R+1 )-approximation of the optimal solution on D, i.e. of the maximum number of bits to be saved.
To see why, let O be the optimal solution, and let w O , w F and w H be the total weights of O, F , and H, respectively.
Next, let H be a sub-forest of F obtained by splitting the arcs of F in R + 1 groups, depending on their distance from the root of their tree in F modulo R + 1, then removing the group of smallest weight; it is evident that H has no paths longer than R, and that its weight w H is at least (1 − 1 R+1 )w F , as the weight of smallest of the R + 1 groups could not be more than 1 R+1 w F . Now observe the following: • w F ≥ w O , as F is the optimal solution for R = ∞.
• w H ≥ w H ≥ (1 − 1 R+1 )w F , as H is a sub-forest of F , and H contains the optimal sub-forest of F (both with path length bounded by R).

which proves the approximation bound.
Details on computing the optimal sub-forest of F . Given a sub-forest F of F rooted in the node x, let M i (x) be the maximum weight sub-forest of F that has no paths longer than R, and in which the root x is in no path longer than i. If r j are the roots of F , j M R (r j ) is the optimal sub-forest of F we are looking for. We implement a dynamic programming procedure based on the following invariant: if, for all sub-forests rooted in each child y of then y in its sub-forest may only partake in paths of length at most i − 1; on the other hand, if we do not choose (x, y), y may partake in paths of any length up to R. Finally, for the base case, observe that for any leaf l of F , M i (l) = ∅. We thus obtain each M i (x) by the following formula: where children(x) are the children of x in F , and max-w(A, B) returns the set of arcs having greater weight between A and B (breaking ties arbitrarily).
Finally, we give a brief remark on the complexity. This is important since a trivial implementation would take quadratic time and space to represent each set M i (), making this approach unfeasible on graphs with millions of nodes. However, we can implement it in O(nR) time and space, where n is the number of nodes in F , as follows. We can first run the above dynamic programming algorithm, but associate with each M i (y) just its weight. Furthermore, we keep track for each M i (x) of which was the choice performed on each child y of x (i.e., whether we used (x, y) or not). Computing the weights of M i (x) this way takes just O(1) time for each child, costing us in total O(nR) as F as O(n) arcs. With this information, we can reconstruct exactly which arcs are used in the optimal solution M R (r) in a top-down manner by looking at the information about its children we previously computed.

Experiments
In order to evaluate the efficiency of Zuckerli, we first study the effects of various choices of parameters on compressed size. We also evaluate the effectiveness of the approximation algorithm for reference selection.
We then compare the compression ratio of Zuckerli with respect to existing state-of-the-art compression systems for large graphs, either with novel experiments (WebGraph [10], Graph Compression by BFS [4]) or by referring to the experiments in the relevant papers (LogGraph [6], k 2tree [13] and 2D-Block Trees [12]). We remark that the proposed scheme does not change the order of nodes before compression, and as such a comparison with works that propose algorithms to find a better node permutation (such as [15]) is out of scope of this experimental comparison, although it is an interesting direction for future work.  To evaluate the CPU and memory usage of Zuckerli, we compare its decompression time and memory usage with the corresponding metrics for WebGraph. Moreover, we compare the running time of a depth-first search and a breadth-first search on Zuckerli-compressed graphs, on Webgraphcompressed graphs and on uncompressed graphs.
Finally, to evaluate the parallelism of the code, we compute the speedup achieved by Zuckerli on an edge-summing problem when running on 2, 4, 8, 16, 32 and 64 cores.
For all experiments where list decompression is required, R is set to 3 (similarly to the compressed WebGraph files that used for comparisons), the chunk size C is set to 32, and the minimum run of 0s to use RLE L is set to 3.
The code to run the experiments was written in C++ and compiled with clang++-10; it is available at https://github.com/google/zuckerli. The experiments were ran on a 32-core AMD 3970X CPU (with hyperthreading) with 256GB of RAM.

Datasets
To run the comparisons, we use graphs from the WebGraph corpus [10,9,8], which are available at http://law.di.unimi.it/datasets.php. The datasets we use include both social networks and web graphs, with a number of edges varying from a few millions to 91 billions and a number of nodes varying from a few hundred thousands to 1 billion. More details about the graphs can be found in Table 1. When reporting results, graphs with a -hc suffix represent the full decompression versions, while other graphs represent the compressed versions also supporting list decompression.

Parameter Choice
We first investigate the effect of the parameters controlling the integer encoding scheme, trying different combinations of the number of bits that are included in the entropy-coded part and the number of integers that are entropy coded as-is. The results are shown in Table 2. They show that using more fine-grained integer representations, i.e. entropy-coding more bits or having more direct-coded integers, does not give significant improvements or even worsens the compression ratio.   Table 3. They show that increasing window size gives significant, although diminishing, savings on compressed size.
Finally, we compare the effect of changing the number of iterations through which reference lists are chosen (see Section 3.4), varying between 1 (corresponding to only using the simple fixed model) to 3. The results are shown in Table 4. They show that using a non-fixed model provides significant savings compared to the fixed one. On the other hand, further refinement of this model does not improve the compressed size, and is thus not worth the extra encoding effort.
As a consequence of these results, we perform further experiments using k = 4, i = 1, j = 0, W = 32, and 2 rounds of reference selection. We remark that W = 64 would have achieved better compression, but the WebGraph dataset was compressed using W = 32. We therefore pick this value for ease of comparison.

Effect of Approximation Algorithm and Context Modeling
We evaluate the gain from using the improved algorithm for reference selection (in Section 3.4), as opposed to the simple greedy algorithm used by WebGraph. The results are shown in Table 5. We remark that, as the reference selection is employed only when list decompression is supported, the table does not report results for the -hc version of the graphs.
We also report the effects of disabling Zuckerli's context model, by using the same probability distribution for all the entropy coded symbols. The results are shown in Table 6.
The results show that the gains from the approximation algorithm are significant, reaching up to 12% for web graphs, and also providing some benefits for social networks like tw-2010. The gains from the context model are similar.   Table 5: Comparison of the compressed size achieved by using the greedy algorithm used by WebGraph for reference selection and the size achieved by our approximation algorithm described in Section 3.4.
We remark that this improvement is significant in a lossless compression context. In comparison, one of the most well-known advances in general purpose compression, the Burrows-Wheeler Transform [14], achieved roughly a 16% size reduction compared to previous approaches.

Compression Results and Resource Usage
For the chosen set of parameters, we report the compression speed and the resulting compression ratio on various graphs. We also compare the resulting compressed size with the ones achieved by WebGraph and by Graph Compression By BFS (GCBFS). To perform this comparison, we use the files available from the WebGraph corpus itself, without any recompression, and the implementation of GCBFS that is publicly available, with parameters l = 10000 for full decompression and l = 8 for list decompression. The results are shown in Table 7. They show that Zuckerli typically achieves 20% to 30% size savings when compared to WebGraph on web graphs, and 10% to 15% size savings on social networks. In comparison, GCBFS achieves worse compression ratios than WebGraph in the larger datasets (hw-2009, tw-2010, uk-2007)  We also compare Zuckerli's compression ratios to those achieved by k 2 -trees [13] and 2D-Block Trees [12]. While those data structures allow for single edge queries, Zuckerli only allows, in its least dense configurations, for individual adjacency list queries. Thus, the methods are not directly comparable. However, according to the results reported in [12], both representations are significantly less dense than Zuckerli, with the best of the two producing compressed representations bigger by 30% or more. Further, according to the reported speed, the faster of the methods is able to process roughly 200 thousand edges per second, due to the intense use of sophisticated succinct data structures causing many cache misses, which is orders of magnitude slower than Zuckerli.
Finally, while we did not perform a direct comparison with LogGraph [6], we remark that while it offers improved performance for list access compared to WebGraph, it does not achieve better compression ratios, as reported in [6] (see also Appendix A).
We also explore how the bit budget of Zuckerli is spent across the various parts of the graph that get encoded: degrees, references, blocks, and residuals, with the first residual being considered separately. The results are shown in Table 8. They show a remarkable difference between web graphs and social networks. Indeed, in social networks, almost all the bits are spent encoding residuals, while in web graphs the fraction of bits used for residuals is not as significant. This can be explained by the greater effectiveness of the block copying mechanism on web graphs, due to greater similarity in outgoing adjacency lists.

Performance Evaluation
We evaluate the performance characteristics of Zuckerli by comparing its running time and memory usage for running depth-first and breadth-first traversals with WebGraph (only for the variants that allow access to single lists), as well as with uncompressed graphs, as a baseline. The running time and the memory usage are reported in Table 9. We also compare the time and memory usage for name compression bits/edge speed (10 6 e/s) Zuckerli WebGraph GCBFS  running a full sequential decompression of the graphs, with results reported in Table 10. From these comparisons, it emerges that the memory usage for decompression and random access required by WebGraph and Zuckerli is very different, with both methods using less memory in some situations. This can be explained by the different language of the implementation (C++ and Java), as well as the fact that WebGraph uses lazy iteration on adjacency lists, to avoid decompressing them fully to memory. While this can in principle be supported by Zuckerli, it was not implemented in this version of the code.
Regarding running time, Zuckerli is often faster than WebGraph. This is due to the fact that Zuckerli requires less memory bandwidth than WebGraph (as it uses less bits for compression), and that it is written in highly optimized C++ code.
Finally, to evaluate the scalability of Zuckerli on multiple cores, we wrote a simple program that computes the sum of all endpoints of all edges of a graph, and we ran it on uk-2007-02 using 1, 2, 4, 8, 16, 32 and 64 cores. The results are shown in Figure 2. They show the good scalability of Zuckerli; the speedup is likely limited by memory bandwidth.

Conclusions
In this paper, we described Zuckerli, a novel compression algorithm and compressed data structure designed for very large graphs. By exploiting recent entropy coding techniques, context modeling and improved encoder heuristics based on approximation algorithms, Zuckerli can achieve significant space savings for compressing web graphs and social networks over state-of-the-art systems, such as  the WebGraph framework. By conducting experiments on a large corpus of web graphs and social networks, we quantified these savings as roughly 25% on web graphs and roughly 12% on social networks, both for the full and list decompression use cases. In data compression, this is considered a significant improvement. For example, bzip2 is preferred to gzip for file compression when space saving is crucial, because it has 10 ∼ 30% better compression ratios [14]; on the other hand, bzip2 is slower and has a larger memory footprint than gzip. Zuckerli achieves similar improvements, but is also faster than Webgraph, with a smaller memory footprint in many cases. Decompression with Zuckerli is fast, resource-efficient, and scalable.  Table 9: Running time (in milliseconds) and memory usage (in MB) for running breadth-first and depth-first search on both the uncompressed and the compressed representations (both with Zuckerli and WebGraph) of various graphs. We also report the average time (in µs) to access each adjacency list.