Motif and Hypergraph Correlation Clustering

Motivated by applications in social and biological network analysis we introduce a new form of agnostic clustering termed motif correlation clustering, which aims to minimize the cost of clustering errors associated with both edges and higher-order network structures. The problem may be succinctly described as follows: Given a complete graph $G$ , partition the vertices of the graph so that certain predetermined “important” subgraphs mostly lie within the same cluster, while “less relevant” subgraphs are allowed to lie across clusters. Our contributions are as follows: We first introduce several variants of motif correlation clustering and then show that these clustering problems are NP-hard. We then proceed to describe polynomial-time clustering algorithms that provide constant approximation guarantees for the problems at hand. Despite following the frequently used LP relaxation and rounding procedure, the algorithms involve a sophisticated and carefully designed neighborhood growing step that combines information about both edges and motifs. We conclude with several examples illustrating the performance of the developed algorithms on synthetic and real networks.


Introduction
Correlation clustering is a clustering model first introduced by Bansal, Blum, and Chawla in [2] and it may be succinctly described as follows: One is given a collection of objects and, for some pairs of objects, one is also given quantitative assessments of whether the objects are similar or dissimilar. This information is represented using a labeled graph with edges marked by + or − symbols according to whether the endpoints are similar or dissimilar. The goal is to partition the vertices of the graphs so that edges labeled by + tend to aggregate within clusters and edges labeled by − tend to go across clusters. Unlike most other known clustering methods, correlation clustering does not require the number of clusters to be specified in advance.
There are two formulations of the correlation clustering optimization problem: MinDisagree and MaxAgree. In the MinDisagree version of the problem one aims to minimize the number of erroneously placed edges, while in the MaxAgree version one seeks to maximize the total number of correctly placed edges. Finding an optimal solution to either problem is NP-hard. The MinDisagree problem remains hard even when the input graph is complete [2]. For complete graphs, several constant approximation randomized [1] and deterministic [5] algorithms are known. When the graph is allowed to be arbitrary, the best In the second version of the problem, termed mixed motif correlation clustering (MMCC), we are allowed to fix multiple motif graphs of possibly different sizes 2 ≤ k 1 < k 2 < . . . < k p ,and we seek a vertex partition C = (C 1 , . . . , C s ), s ≥ 1, that minimizes the objective function: (MMCC) min Here, λ t ≥ 0 are relevance factors of the motifs of size k t . Note that by choosing λ = 1 for edges and setting all other relevance factors to zero, we arrive at the classical correlation clustering formulation. Furthermore, in both problems, we impose the triangle constraint on the weights w + K + w − K = 1. Clearly, both the MCC and MMCC problems are NP-complete, as the correlation clustering problem is NP-complete. Furthermore, the following theorem, proved in Appendix A, shows that the problems remain hard even for restricted choices of motifs, such as the case when k = 3. Hence, we focus on developing (constant) approximation algorithms for the problems.
Theorem 2.1. For k = 3, the MCC problem is NP-complete.
We pointe out that one may also consider the MaxAgree version of the motif clustering problem where the objective functions in (MCC) and (MMCC) that summarize disagreement are replaced by objective functions that summarize agreement, and where correspondingly the min function is replaced by the max function. As for the case of correlation clustering, it is straightforward to show that taking the better of two clusterings, the all-singleton clustering and the single-component clustering provides a 2-approximation for the problem.
The MinDisagree version of correlation clustering is usually approximately solved using two approaches: Pivoting methods [1] and relaxed Integer Programming (IP) methods that reduce to solving a Linear Program (LP) followed by rounding [5]. The pivoting algorithm is a straightforward randomized approach that provides constant approximation guarantees for the expected value of the objective, and has straightforward, yet efficient, parallel implementations [14]. For the unweighted clustering problem, it may be succinctly described as follows: One selects a pivot vertex uniformly at random, incorporates all its "similar" neighbors (i.e., those with edge label '+') into one cluster, removes all vertices in the newly formed cluster from the graph and then proceeds to iteratively repeat the same steps. Unfortunately, using this approach for motif clustering cannot lead to constant approximation results, as illustrated by the example below.
Consider the MCC problem for complete graphs and triple-motifs, i.e., for k = 3. Suppose that each edge is labeled, with labels in the set {+, −}, and that each triple K is associated with a pair of weights (w + K , w − K ) ∈ {(1, 0), (0, 1)}. triples that correspond to triangles with positively labeled edges only have weights (w + K , w − K ) = (1, 0), and are termed "positive" triples. All other triples have weights (w + K , w − K ) = (0, 1), and are termed "negative" triples. For this setting, neither pivoting on a pair of vertices (e.g., an edge) nor pivoting on a single vertex may provide constant approximation guarantees, as demonstrated by the examples in Figure 1. Both graphs are complete graphs but for ease of interpretation, only positively labeled edges are depicted. In the first case, one chooses a (positive) edge uniformly at random and includes in the cluster all positive edges connected to the pivoting edge. For Figure 1 a), the optimal clustering comprises two clusters, 5 , v 6 } and has an MCC objective function value equal to zero. If one pivots on the edge (v 1 , v 4 ), the resulting clustering contains one cluster only, C1 = {v 1 , v 2 , . . . , v 6 }, and leads to a positive value of the objective function, and hence an unbounded ratio of the optimal and approximate objective. Pivoting on vertices may fail as well, which may be seen from example b): The graph in b) has a unique optimal clustering with two clusters C 1 = {v 1 , v 2 , v 3 } and C 2 = {v 4 , v 5 , ..., v n }. Choosing the vertex v 3 as pivot and including all vertices connected to v 3 through positive edges leads to v 1 , v 2 , v 4 being clustered together with v 3 , thereby resulting in O(n 2 ) more errors than those incurred by the optimal clustering. As there are n vertices in the graph, the expected value of the objective may have an error term O(n). Figure 1: Pivoting on edges and vertices of graphs. Note that K n−3 stands for a complete graph on n − 3 vertices.

Main results
We describe next polynomial-time, constant approximation algorithms for the MCC and MMCC problems. For the former case, we propose two methods that offer different trade-offs between optimization performance and complexity, as measured in terms of the number of constraints used in the underlying LP program. The approach followed is to relax the IPs of (1) and (2) to LPs and then perform rounding of the fractional solutions. The main analytical difficulties encountered in this approach are that the LPs involve both edge and higher order motif variables, and that trying to round all these variables simultaneously may cause inconsistencies and large rounding errors. More precisely, in the LP formulation one has to incorporate variables associated with k-tuples, while rounding only works with variables associated with pairs of vertices. To overcome this issue for the MCC problem, our first solution introduces motif variables in the LP and then performs rounding on edges by assigning to them a cost that reflects the value of the best-scoring motif that the edge is part of. The second solution is based on an LP which involves both motif and edge variables and allows downstream rounding to be directly performed on the edge variables.
The second method has fewer constraints in the underlying LP than the first method, and is hence more computationally efficient. The drawback is that provides worse approximation guarantees than the first method. For the MMCC problem, one may use the second method developed for the MCC problem with the inclusion of additional constraints for k-tuple and edge variables. The approximation factor is determined by the size of largest motifs. As in the formulation of the MCC problem, let K correspond to a k-tuple and let x K denote the indicator variable for the event that the vertices in K are split among clusters (i.e., x K = 0 if the vertices of K lie in the same cluster, and x K = 1 otherwise). Relaxing the above integral constraint to x K ∈ [0, 1] and rewriting the probability weight constraints leads to the following relaxed MCC optimization problem: Note that the constraints imposed on triples in Υ ensure that if two motifs share vertices and belong to the same cluster, the additional motifs formed by the vertices also belong to the same cluster.
The LP solutions are rounded according to Algorithm 1, described below. The intuition behind the rounding algorithm is to use the fractional solutions of the LP k-tuple variables to perform rounding on pairs of variables. The reason for using different variables in the LP and in the rounding procedure is that the LP constraints are harder to state and analyze via pairwise variables, while rounding is harder to perform via k-tuple variables as they incur complex codependencies. The key is to transition from k-tuples to pairs of variables by recording the "best motif" to which an edge belongs, and then using the corresponding fractional value of the motif variable to perform neighborhood growing via edge incorporation.
Theorem 3.1. Let k be a constant size of a motif. For any α ≤ 1 k and the probability constraint w + K + w − K = 1 satisfied by every motif K of size k, the LP coupled with the rounding procedure of Algorithm 1 provides a 2 α -approximate solution to the MCC problem. Proof. The proof is given in Appendix B.
Choose an arbitrary pivot vertex v in S For all u ∈ S/{v}, compute y vu = min K⊆S:v,u∈K For constants k such that k n, |Υ| = Θ(n 2k−1 ). This indicates that the number of constraints in the LP grows exponentially with the size of the motif, which may lead to computational issues when the motifs are large. The next LP has a significantly smaller number of triangle constraints, reduced from Ω(n 2k−1 ) to Ω(n 3 ). In particular, this LP excludes a number of triangle inequalities as constrains. One cannot reduce the number of constraints below Θ(n k ), as Θ(n k ) variables are needed to represent all possible k-tuples.
To describe the LP, we introduce some auxiliary variables. Let z vu , v, u ∈ V, denote the indicator of the event that a pair of vertices v, u belong to different clusters (i.e., z vu = 0 if v and u belong to the same cluster, and z vu = 0 otherwise). By replacing the indicator variables z vu ∈ [0, 1] and letting x K ∈ [0, 1] as before, we arrive at the following LP problem formulation.
x K ≥ z vu (for all K ∈ K(V ) and v, u ∈ K), (4) A simple counting argument reveals that the number of constraints in the LP equals Θ( n k k 2 + n Algorithm 2 Rounding Procedure with parameters α, β ≤ 1 Let k be a constant size of a motif. For any α, β ≤ 1 k and the probability constraint w + K + w − K = 1 satisfied by every motif K of size k, the LP coupled with the rounding procedure of Algorithm 2 provides a 1 αβ -approximate solution to the MCC problem.
Proof. The proof of the theorem is presented in Appendix C.
Observe that the approximation guarantees of Theorem 2 are worse than those of Theorem 1, which is the price paid for reducing the number of constraints. Furthermore, since the rounding procedure operates on pairs of vertices only and does not involve variables for k-tuples, it may be used for solving the MMCC problem as well. We outline the corresponding result in what follows. let S = {k 1 , k 2 , ..., k p } be the set of motif sizes of interest, and let K t (V ) be the set of all k t -tuples of V . Using the same notation as in the MCC version of the problem, we may state the following LP relaxation for the MMCC problem: The rounding method accompanying this LP is also described in Algorithm 2, with the parameters α, β bounded from above by 1 k * , where k * = max S = max{k 1 , k 2 , ..., k p }.
Corollary 3.3. For α, β ≤ 1 k * , and all motif weights satisfying the probability constraint w + K + w − K = 1, the rounded LP algorithm provides an 1 αβ -approximate solution to the MMCC problem.
Proof. Note that the simplest way to prove this result is to focus on the largest motif only, and use the previously described MCC result. In particular, the stated result does not depend on the particular choices of the parameters λ used.
Still, one can derive more precise and stronger approximation guarantees by focusing on all motifs simultaneously, in which case the analysis becomes rather tedious and involved. For the special case of two motifs (p = 2) with sizes k 1 = 2 and k 2 = k respectively, we provide tighter approximation results in Theorem 3.4. Here, both the parameters α, β depend on λ. The underlying derivations are relegated to Appendix D.
Theorem 3.4. Consider the MMCC problem with two types of motifs of sizes k 1 = 2 and k 2 = k. The objective function LP3 may be rewritten as where z uv and x K are variables associated with pairs of vertices and k-tuples of vertices, and λ is a parameter that can be tuned to balance the penalties induced by edges and motifs of size k. Let r 0 be a constant equal to Then, for any α ≤ 1/k, β ≤ 1/(k − r 0 ), and provided that the weights satisfy the probability constraint w + K + w − K = 1 for both k-tuples and edges (i.e., w + uv + w − uv = 1, the LP and rounding procedure of Algorithm 2 produce a 1 αβ -approximate solution to the edge-motif MMCC problem.

Numerical results for small social networks
We evaluated our (M)MCC methods on two benchmark networks from [3], which were originally tested using the method described in [3] (henceforth termed TSC), and on the well known Zachary karate club network [17]. In all the experiments, we considered motifs of size k = 2 and k = 3 only. Hence, one of the motifs are edges and for the case k = 3, the motif may be selected based on the particular application, as subsequently described. When solving MCC, we use the LP2 formulation as it contains fewer constraints than LP1 and thus can be solved more efficiently. We then leverage Algorithm 2 for downstream rounding. When solving MMCC, we use a combination of LP3 and Algorithm 2.

Partitioning layered flow networks.
The first example is what we refer to as a layered flow network (see Figure 2). The information flow between two layers typically follows the same direction while feedback loops are primarily contained within a layer. The task is to detect the layers in the network. To perform the layer clustering, we assign the value 1 to each weight w K corresponding to a directed 3-cycle (i.e., triple {j 1 , j 2 , j 3 } with edges directed according to j 1 → j 2 , j 2 → j 3 , j 3 → j 1 , or the reverse order), encouraging the corresponding triples to lie within a layer, while we assign a arbitrary weight in [0.41, 0.48] to all other type of triples. The clustering results are shown in Figure 2. Both MCC and the method of [3] produce similar clustering results, which identify the layers of the network. The only difference is observed for the node with label 3. The MCC method emphasizes the feedback loops inside a layer, and hence node 3 is placed in the same cluster as nodes 4, 5, 6, 7. The other method emphasizes the importance of the direction of information flow and thus the flow from node 3 to node 1 does not permit clustering nodes 3, 4, 5, 6, 7 together.

Anomaly detection.
Practical networks usually contain bidirectional edges, i.e., edges that allow both directions of traversal. A large number of these edges lie within directed 3-cycles [3]. Hence, if a part of a network contains many directed 3-cycles but very few bidirectional edges, it may be viewed as an anomaly. An illustrative example is shown in Figure 3, in which the nodes labeled 0-5 form an anomalous component which we wish to detect as it contains 8 directed 3-cycles without any bidirectional edges. The edges between nodes 6-21 are generated according to a standard Erdős-Rényi model with probability 0.25 and to keep the figure simple, those edges were not plotted. Note that each of the nodes labeled 0 − 5 has 4 outgoing and 2 incoming edges within the group of vertices containing 6 − 21. There are 20 directed triangles without bidirectional edges.
To use our MCC method, we set the weights for the triangles without bidirectional edges to 1, and those for other types of triangles to a value smaller than 0.42. As the results shown in the Figure. 3 demonstrate, our method outperforms the TSC method in terms of detecting the anomaly.

A benchmark social network:
Zachary's karate club with two communities [17].
We also tested the performance of the CC, MCC and MMCC methods on the Zachary's karate club network. In the CC model, we assign weights to each pair of vertices weights depending on whether they are connected by an edge or not. For the MCC method, we focus on 3-tuples and assign weights to the 3-tuple weights according to whether their corresponding vertices form a triangle or a path. We use both triangles (K 3 ) and 3-paths (P 3 ) as motifs to ensure that nodes with very small degree can be clustered more accurately by examining their inclusion into important motifs involving vertices of large degree. The MMCC method uses both 2-tuples and 3-tuples. The weight assignments used in all these methods are listed Table 1. The result is shown in Figure. 4. Although we tested CC for a number of choices for the weights, we inevitably ended up with one clustering error, vertex 10. This vertex is connected to 34 in Cluster 1 and vertex 3 in Cluster 2. On the other hand, the MCC and MMCC methods recovered the the ground truth clustering by taking into account the K 3 and P 3 motifs. The reason for this finding is that in social networks, vertices within a cluster typically connect to some central vertices in the same cluster (like vertex 34 and vertex 1). Hence, they form many triangles and 3-paths containing the central vertices.

A Proof of Theorem 2.1
To prove that the problem is in NP, we focus our attention on the case (w + K , w − K ) ∈ {(1, 0), (0, 1)}. Since w + K ∈ {0, 1}, as before, we refer to a triple K with w + K = 1 (respectively, w + K = 0) as "positive" (respectively, "negative"). We also use the term "positive error" to indicate that a positive triple is placed across clusters and "negative error" to indicate that a negative triple is placed within one cluster.
Following the approach used to prove NP-hardness of CC [2], we use a reduction from the NP-complete Partition into Triangles [7] problem. Given a (not necessarily complete) graph G = (V, E), containing n vertices where n is a multiple of 3, the goal is to decide whether it can be partitioned into triangles.
As the first step in our proof, we construct a graph G w that has the same vertex set as G and view triangles of G w as motifs. We set the weights of triples G w that correspond to triangles in G to (1, 0), and the weights of all other triples in G w to (0, 1). We solve the MCC problem over G w under the additional constraint that the size of each cluster is at most 3. The existence of an efficient algorithm for solving this MCC would imply the existence of an efficient algorithm for partitioning G into triangles, a contradiction. As the original MCC algorithm does not necessarily generate clusters with bounded size 3, in what follows we describe how to construct another graph, H w , such that the triples-MCC algorithm applied on H w results in a bounded cluster-size run of MCC on G w .
The basic idea behind our approach is to impose the constraint on the size of clusters in G w by adding vertices in H w for each triple in G w , and then making the triples formed by the the newly added vertices positive and other triples negative. In this way, a cluster in the new graph H w with more than 3 vertices in G w causes a large number of negative errors and hence cannot be part of an optimal clustering.
We now describe how to construct a graph H w based on G w . In addition to the vertices of G w , for every triple {u 1 , u 2 , u 3 } in G w , H w contains additional n 5 vertices, denoted by C u 1 u 2 u 3 . For simplicity of notation, write C u 1 u 2 u 3 ∪ {u 1 , u 2 , u 3 } = C u 1 ,u 2 ,u 3 . Clearly, H w contains n + n 5 n 3 vertices. We classify the triples in H w into three types: 1. T-I triples: {u 1 , u 2 , u 3 }, for all u 1 , u 2 , u 3 ∈ V (G w ).
2. T-II triples: triples in C u 1 ,u 2 ,u 3 that are not T-I triples.

T-III triples: triples that are neither T-I triples nor T-II triples.
The number of T-I triples is n 3 . As they are inherited from G w , we keep their weights equal to those in G w . The number of T-II triples equals n 3 [ n 5 +3 , and we assign the weights (0, 1) to them.
Consider now a clustering C * of H w of the following form: 1. There are n 3 nonoverlapping clusters; 2. Each cluster contains one of the sets C u 1 u 2 u 3 or one of the sets C u 1 u 2 u 3 ; 3. Each vertex u inherited from V (G w ) lies in exactly one cluster.
In the above clustering, there are no errors arising due to T-III triples, because all T-III triples are negative and C * has property 2). The only errors arise from T-I triples and T-II triples. The number of errors induced by T-I triples is at most n 3 , while T-II triples errors in C * may be grouped into two categories. First, a triple may have two vertices in C u 1 u 2 u 3 and one vertex in {u 1 , u 2 , u 3 } that lies in another cluster. The number of this type of clustering errors is bounded from above by n( n−1 2 − 1) n 5 2 . Second, a triple may have one vertex in C u 1 u 2 u 3 and two vertices in {u 1 , u 2 , u 3 } that lie in another cluster. The number of this type of errors is upper bounded by n 2 (n − 3) n 5 1 . Therefore, the total number of errors in C * is bounded from above by We may convert the clustering C * into a partition G w based on the clustering of T-I triples. The clustering C * essentially partitions the vertices of G w into clusters containing exactly three vertices. Our subsequent arguments aim to establish that the number of errors in a clustering that contains at least one cluster with at least four vertices from V (G) must be larger than the number of errors induced by C * .
For that purpose, consider another clustering of H w , denoted by C . First, we show that in order for C to have fewer errors than C * , the size of any cluster in C must lie in the interval [n 5 − n 4 , n 5 + n 4 ]. Suppose that on the contrary there exists a cluster containing more that n 5 + n 4 vertices. Then, there are at least n 5 2 n 4 ∼ Ω(n 14 ) negative errors caused by placing T-III triples into this cluster. Furthermore, each cluster must contain at least n 5 − n 4 vertices of a clique, otherwise there are at least n 5 2 n 4 ∼ Ω(n 14 ) positive errors generated by splitting the T-II triples. Second, note the each vertex in V (G w ) belongs to n−1 2 different triples of G w . Since the size of each cluster of C is smaller than n 5 + n 4 , for each vertex in V (G), the number of negative errors caused by splitting the T-II triples that contains this vertex and two vertices from some C u 1 u 2 u 3 is lower bounded by n 5 2 n−1 2 − n 5 2 − n 4 2 . Assume now that there exists a cluster of C that contains four vertices inherited from V (G w ), say {u 1 , u 2 , u 3 , u 4 }. Then, as the size of the cluster is lower bounded by n 5 − n 4 , from the pigeonhole principle it follows that there exists at least one vertex in {u 1 , u 2 , u 3 , u 4 }, say j 1 , and at least 1 4 (n 5 − n 4 ) other vertices that do not lie in one of the sets C u 1 u u for some u , u ∈ v(G w ). Hence, the number of negative errors caused by T-III triples within this cluster is at least . The total number of errors induced by such a clustering is therefore at least n n 5 2 which is larger than the number of errors in the clustering C * , for n sufficiently large. Therefore, the optimal triangle-clustering has to be of the form of C * , imposing a constraint on the size of clusters in G w .

B Proof of Theorem 3.1
Since we assume that the weights satisfy the probability constraint w + K + w − K = 1, we will use w K to refer to w + K and 1 − w K to refer to w − K . Let N α (v) be the set defined in the rounding procedure. If N α (v) = ∅, N α (v) contains at least k − 1 elements, because if x K ≤ α for some k-tuple K, then all its elements (except  possibly For convenience, we also define, given a pivot vertex v and a k-tuple K that contains v, y K = u∈K/{v} y vu . Furthermore, we let Thus, by using the LP constraint and the definition of y uv , we have Let K v be the set of all the k-tuples K such that K ⊆ N α (v), K v. When v is a pivot vertex and K ∈ K v , we know that The following proof often uses another form of the constraint in the underlying LP, i.e., Next, we compare the rounding cost and the LP cost for different types of outputs of the algorithm. All possible cases and their corresponding approximation constants are listed in Table 2.
Case 1: The output is the singleton cluster {v}. The clustering cost when outputting a singleton {v} is K⊆K(S):v∈K w K while the LP cost is K⊆K( we have x K > α, so charging each such k-tuple 1 α w K x K times its LP-cost compensates for the cluster-cost. Therefore, it suffices to consider the k-tuples K ∈ K v . For K ∈ K v , the LP cost is bounded by where the first inequality is due to (7), the second inequality is due to (8) and w K ≤ 1, while the third inequality is due to the condition that the algorithm outputs a singleton cluster {v}. Therefore, charging 2 α for the k-tuple is enough to compensate for the clustering cost. Case 2: The output is the cluster N α (v). Case 2.1: First, consider the cost of the k-tuples inside the cluster. If v ∈ K, then we have K ∈ K v and thus x K ≤ y K ≤ (k − 1)α. So, charging 1 1−(k−1)α for this tuple suffices to compensate the cluster-cost.
If v / ∈ K, order the vertices in N α (v) in such a way that for any u 1 , u 2 ∈ N α (v), u 1 ≺ u 2 iff y vu 1 < y vu 2 and assign an arbitrary order (u 1 ≺ u 2 ) when the equality (y vu 1 = y vu 2 ) holds. For be the set of k-tuples K ∈ N α (v) such that u is the largest vertex of K according to ≺. Thus, if K ∈ K (u) v , then u ∈ K and K ⊆ R u . Note that because of the order, we have u ∈Ru y vu ≤ α 2 |R u |. Now for all u ∈ N α (v), let us consider the total cost of the k-tuples in R u . The corresponding cluster-cost is So, charging v is enough to compensate the cluster-cost.
v , and let {u 1 , . . . , u k−1 } be the vertices in K/{u}. Let y K = u j ∈K/{g} y vu j . For each j ∈ {1, . . . , k − 1}, let K j = (K/{u j }) ∪ {v}. As each K j ∈ K v , the LP constraints imply: The LP constraints yield x K j ≤ (k − 1)α for each j ∈ {1, . . . , k − 1}, since i ∈ K j for each j, by the same argument used to establish inequality (7). Since each K j ∈ K v , we have The inequality (9) is linear in σ, so we study its behavior when σ is an endpoint of this interval. When σ = (k − 1) α 2 , we obtain and when σ = (k − 1) 2 α, we obtain , so by linearity, we also have for all K ∈ K (u) v . Now, recall that u ∈Ru y vu ≤ α 2 |R u |; as every vertex in R u appears in exactly |Ru|−1 Thus, summing inequality (10) over all tuples in K (u) v yields the following lower bound on the total LP-cost of these tuples: Therefore, charging 2 2−(2k−1)α for each k-tuple in K (u) v suffices to compensate the cluster-cost. Case 2.2: Compensating the cost of splitting a k-tuple. Each tuple K split during clustering incurs a cluster-cost of w K and an LP-cost of x K w K + (1 − x K )(1 − w K ). First, suppose that K is a split k-tuple. Since K was split, x K > α, and charging 1 α times the LP cost pays for such a K.
be the set of split tuples K such that v / ∈ K and K/N α (v) = S . According to the definition of S , for any split tuple K, there is a corresponding S . We show that the total cluster-cost of the tuples in K (S ) v is at most a constant times their total LP-cost. To establish the claim, let S N be the collection of all subsets , and take an arbitrary setS ⊆ N α (v)/S with S = k − 1 − |s|. We have x {v}∪S∪S ≤ (k − 1)α and thus  Note that each tuple K j is a split tuple. We have: Let σ x = u j ∈S x K j and let σ y = u j ∈S y K j . The inequalities above yield the following lower bound on the LP-cost of K: We have x K j ≥ α by definition and x K j ≤ 1 − 2 α due to the assumptions made for this case. Thus, we have σ x ∈ [α S , (1 − α 2 ) S ]. As the lower bound in inequality (11) is linear in σ x , we study the behavior of the bound at the endpoints. When σ x = α S , we have Here, we used the fact that α < 2/3. When σ x = (1 − α 2 ) S , we obtain Since α ≤ 2/3, we have 1 − α ≥ α 2 , so that the inequality holds for σ x at both endpoints of the interval, and thus holds for all K ∈ K v , and indeed the map  yields the following lower bound on the total LP-cost of the underlying tuples: Thus, charging each tuple in K (S ) v a factor of 2 α times its LP-cost is enough to pay for the cluster-cost.
In summary, if α = 1/k and we define c = max{ 1 1−α , , then Algorithm 1 charges each k-tuple at most a factor of 2k times its LP.

C Proof of Theorem 3.2
We continue to use the notation introduced in Appendix B. In particular, we let N α (v) = N α (v) ∪ {v} and let K v be the set of all k-tuples K such that K ⊆ N α (v), K v. The following proof often uses some immediate consequences of the LP constraints; here we adopt the convention that z uu = 0 for all u ∈ V : As before, we prove the approximation guarantees by comparing the rounding cost and the LP cost. An overview of the different cases encountered and the corresponding approximation constants is provided in Table 3.
Case 1: The output is the singleton cluster {v}. The clustering cost when outputting a singleton {v} is K⊆K(S):v∈K w K while the LP cost is K⊆K(S):i∈K (1 − w K )(1 − x K ) + w K x K .
If K ∩ [S/N α (v)] = ∅, we have x K > α, so charging each such k-tuple 1 α times its LP-cost compensates for the cluster-cost. Therefore, it suffices to consider the k-tuples K ∈ K v . For any K ∈ K v , we have 1 k−1 u∈K/{v} z vu ≤ x K ≤ u∈K/{v} z vu , where the inequalities are based on the LP constraints. By observing that z vu ≤ α, we have the following bound on the LP cost of K: Since each z vu for u ∈ K satisfies z vu ≤ α ≤ 1/k, the quantity in square brackets is negative, so that w K ≤ 1 implies Summing over all K ∈ K v , we see that Letting σ = u ∈K/{u} z vu so that 1 − x K ≥ 1 − z vu − σ, we have the following lower bound on the LP-cost of K: Now, summing over all K ∈ K Thus, charging each k-tuple in K v . Case 2.2: The cost of splitting k-tuples across clusters. Again, we refer to such tuples as split tuples. Each split tuple K incurs a cluster-cost of w K and an LP-cost of x K w K + (1 − x K )(1 − w K ). First, suppose that K is a split k-tuple with v ∈ K. Since K is split, there is u ∈ K/N α (v) and thus we have x K ≥ z vu > α, so charging 1 α times the LP cost pays for such K. We still must pay for the split tuples K with v / ∈ K. Let S ⊆ S/N α (v) be such that |S | ≤ k − 1. Furthermore, let K (S ) v denote the set of split tuples K such that v / ∈ K and K/N α (v) = S . According to the definition of S , for any split tuple K, there is a corresponding S . We show that the total cluster-cost of the tuples in K (S ) v is at most a constant time their total LP-cost. Case 2.2.1: There exists a vertex u ∈ S such that z vu ≥ (1 + β)α. In this case, for every K ∈ K (S ) v , we can take some arbitrary u ∈ K ∩ N α (v) and obtain x K ≥ z vu − z vu ≥ βα, since u ∈ N α (v) implies z vu ≤ α. Thus, in this case, charging 1 αβ times the LP-cost of each tuple in K v . LetS = K∩N α (v), and σ S = u∈S z vu , σ S = u∈S z vu . We have the following bounds: Combining these bounds yields the following lower bound on the LP-cost of K.