Local Graph Clustering with Network Lasso

We study the statistical and computational properties of a network Lasso method for local graph clustering. The clusters delivered by nLasso can be characterized elegantly via network flows between cluster boundary and seed nodes. While spectral clustering methods are guided by a minimization of the graph Laplacian quadratic form, nLasso minimizes the total variation of cluster indicator signals. As demonstrated theoretically and numerically, nLasso methods can handle very sparse clusters (chain-like) which are difficult for spectral clustering. We also verify that a primal-dual method for non-smooth optimization allows to approximate nLasso solutions with optimal worst-case convergence rate.


INTRODUCTION
Many important applications generate network structured data. We can represent such networked data conveniently using an undirected "empirical" or "similarity" graph G [4,11]. An important task when analysing or processing networked data is to group or cluster the data points into subsets of data points that are more similar to each other than to other data points.
Local graph clustering methods start from a set of seed nodes and explore their neighbourhoods to determine suitable clusters around the seed nodes [9,10]. The runtime of these methods depend gracefully on the size of the local clusters they are to deliver [9]. Such methods are therefore attractive for big data applications involving massive graphs.
A line of previous work provides an analysis of local convergence behaviour [8]. This analysis applies if the iterates generated by the method are sufficiently close to the optimum. In contrast, we study global convergence from an arbitrary initial iterate.
Our approach to clustering is based on minimizing the total variation of cluster indicator functions. This is different from spectral clustering methods which are based on minimizing the Laplacian quadratic form [11].
Spectral clustering can be implemented using linear systems which can be analysed conveniently using tools from spectral graph theory. In contrast, our approach amounts to a non-linear dynamic whose analysis is more delicate. It turns out that TV based methods can handle sparsely connected (chain-like) clusters (see Section 6).
We make the following contributions: • We formulate local graph clustering as a particular instance of the nLasso problem.
• We derive a novel local clustering method by applying a primal-dual optimization method.
• We relate the clusters delivered by nLasso to the existence of sufficiently large network flows between the cluster boundaries and the seed nodes.

LOCAL GRAPH CLUSTERING
We consider networked data which is represented by an undirected weighted graph G = V, E, W . The nodes V = {1, . . . , n} represent individual data points. Similar data points are connected by the undirected edges in E. Each undirected edge {i, j} ∈ E is assigned a positive weight W i,j > 0. Local graph clustering starts from a small number of carefully selected seed nodes We assume that seed nodes are grouped into batches S k , each batch containing L k different seed nodes, to the same cluster The seed nodes might be obtained by exploiting domain knowledge. In general, the number of seed nodes is a vanishing fraction of the entire graph (dataset). We can interpret this setting as the extreme case of semi-supervised learning where the labelling ratio (seed nodes represent labeled data points) goes to zero.
In what follows, we study the computational and statistical aspects of a particular method that clusters the nodes in G by exploring neighbourhoods of the seed nodes S. This methods construct clusters C k around the seed nodes in S k such that only few edges leave the cluster C k . In other words, the constructed clusters C k have a small boundary. We will make this notion more precise in Section 5 using the concept of network flows to quantify the connectivity between cluster boundary and seed nodes.
We formalize local graph clustering as the problem of recovering or learning indicator signals x (k) i for the each cluster C k . Spectral graph clustering used the dominant eigenvectors of the graph Laplacian matrix to estimate these indicator signals [11]. In contrast, we learn (approximations of) the indicator signals by TV minimization.

THE NETWORK LASSO AND ITS DUAL
To find a reasonable cluster surrounding the seed nodes in S k we solve the network Lasso Here, we used the total variation (TV) We solve a separate nLasso problem (3) for each batch S k , for k = 1, . . . , F of seed nodes. Note that the non-seed nodes i / ∈ S k are constituted by two groups of nodes: those which belong to the cluster C k and those outside the cluster, i / ∈ C k . Our hope is that the solution to (3) is a good approximation for the indicator function of a well-connected subset around the seed nodes S k . We use the graph signalx i obtained from solving (3) to determine a reasonable cluster C k ⊃ S k .
The idea of determining clusters via learning graph signals as (approximations) of indicator functions of good clusters is also underlying spectral clustering [11]. Instead of TV minimization underlying nLasso (3), spectral clustering uses the matrix Laplacian to score candidates for cluster indicator functions. Moreover, spectral clustering methods do not require any seed nodes with known cluster assignment.
The values chosen for the parameters α and λ in (3) crucially influence the behaviour of the clustering method and the properties of clusters delivered by (3).
In order to chose the values for α and λ we will use intuition provided by a minimum cost flow problem that is dual (equivalent) to nLasso (3). This minimum cost flow problem is not defined directly on the empirical graph G but the augmented graph G = V, E . This augmented graph is obtained by augmenting the graph G with an additional node and edges (i, ) for each node i ∈ V.
By convex duality, along similar lines as in [6,7], one can show that nLasso (3) is equivalent to a dual minimum cost flow problem |y e | ≤ λW e for all e ∈ E.
The constraint (6) can be interpreted as a flow conservation requirement for the flow y e . The constrains (7) can be interpreted as a capacity constraint for the flow y e .
The optimization problems (3) and (5) are equivalent in the following sense. The node signalx i solves (3) and the edge signalŷ e solve (5), respectively, if and only if |ŷ e | ≤ λW e for all edges e ∈ E (10) The conditions (8) and (9) can be interpreted as conservation laws for any flowŷ e that solves the nLasso dual (5). At a seed node i ∈ S k , we inject (or extract) a flow of valuex i −1 into the graph. For all other nodes i / ∈ S k , we inject (or extract) a flow of value αx i . The optimal flowŷ e has to provide these demands while respecting the capacity constraints (10). We illustrate the conditions (8)- (11) in Figure 1 for a simple chain-structured graph. We will show in Section 5 how to use of the optimality condition (8)- (11) to characterize the solutions of nLasso (3). We will derive conditions on the structure of the empirical graph G and location of seed nodes S k , that ensure solutions of (3) to be an indicator function of clusters C k .

COMPUTATIONAL ASPECTS
The necessary and sufficient conditions (8)-(11) characterize any pair of solutions to the nLasso (3) and its dual (5). We can find solutions to the conditions (8)- (11), which provides a solution to nLasso in turn, by reformulating those coupled condition as a fixed point equation.
There are many different fixed-point equations that are equivalent to (8)- (11). We will use a particular construction which results in a method that is guaranteed to converge to a solution of (3) and (5) and can be implemented as a scalable message passing on the empirical graph G. This construction is discussed in great detail in [3] and has been applied to variation of the nLasso (3) recently [6].
Here, γ i = 1/d i is the inverse of the node degree d i = .... The updates (12)-(17) can be implemented as a distributed message passing method for jointly solving nLasso (3) and the dual network flow problem (5).The iteratesx e define a graph signal and flow on G, which converge to an nLasso solutionx i and optimum (dual) flowŷ e , respectively [5].
The update (14) enforces the capacity constraints (7) to be satisfied for the flow iteratesŷ (r) . The update (15) amounts to adjusting the current nLasso estimatex (r) i , for each node i ∈ V by the demand induced by the current flow approximation y (r) . Together with the updates (16) and (17), the update (15) enforcesŷ (r) to satisfy the conservation laws (8) and (9).

STATISTICAL ASPECTS
Remember that our approach to determining a cluster C k around the seed nodes S k is to (approximately) solve the nLasso problem (3). We interpret the solution of (3) as the indicator function of the cluster C k .
While the iterates obtained from the primal-dual updates (12)-(17) are guaranteed to converge to a solution fo (3) we have to stop iterating after a finite (typically small) number of steps. This results in a non-zero optimization errorx  Fig. 2. Empirical graph G with some seed node i k (shaded) and a local cluster C k around i k . The boundary of cluster C k consists of a single edge i, j ∈ E with weight W i,j = 1/2. All other edges have weight one.
A key characteristic of the cluster C k is its boundary (see Fig. 2) Our main theoretical result are necessary conditions on the cluster (18) and the nLasso parameters α and λ (see (3)).

Proposition 1. Consider the cluster (18) derived from the iteratex (r)
i obtained after a sufficient number of updates (12)- and λ e∈∂C k Proof. Follows from the optimality conditions (8)- (11).
The necessary conditions (20) and (21) can be used for guiding the choice of the nLasso parameters α and λ in (3). We can increase the applicability of the condition (21) if we have an upper bound U on the number of nodes i / ∈ C k that have been reached by message passing updates (12)-(17). Given such an upper bound U on the number of "relevant" nodes

NUMERICAL EXPERIMENTS
We verify Proposition 1 numerically using a chain-structured empirical graph G which might represent time series data [2]. The chain structured empirical graph G contains n = 10 nodes. Consecutive nodes i and i+1 are connected by edges of weight W e = 5/4 with the exception of edge {2, 3} which has weight W 2,3 = 1.
The source code for this experiment can be found at https://github.com/alexjungaalto/.

CONCLUSION
This work offers several interesting avenues for follow-up research. First, we have recently proposed nLasso methods for learning networked exponential families. Networked exponential families allow to couple the network (cluster) structure of data with statistical models for high-dimensional data points represented by the graph nodes. It is interesting to study how the properties of these node-wise statistical models can be exploited to guide local clustering methods. Another interesting follow-up research question is the study of sparsification methods that aim at pruning the data graph to reduce computational complexity while maintaining the cluster structure.