Higher-Order Explanations of Graph Neural Networks via Relevant Walks

Graph Neural Networks (GNNs) are a popular approach for predicting graph structured data. As GNNs tightly entangle the input graph into the neural network structure, common explainable AI approaches are not applicable. To a large extent, GNNs have remained black-boxes for the user so far. In this paper, we show that GNNs can in fact be naturally explained using higher-order expansions, i.e. by identifying groups of edges that jointly contribute to the prediction. Practically, we find that such explanations can be extracted using a nested attribution scheme, where existing techniques such as layer-wise relevance propagation (LRP) can be applied at each step. The output is a collection of walks into the input graph that are relevant for the prediction. Our novel explanation method, which we denote by GNN-LRP, is applicable to a broad range of graph neural networks and lets us extract practically relevant insights on sentiment analysis of text data, structure-property relationships in quantum chemistry, and image classification.


INTRODUCTION
Many interesting structures found in scientific and industrial applications can be expressed as graphs. Examples are lattices in fluid modeling, molecular geometry, biological interaction networks, or social / historical networks. Graph neural networks (GNNs) [1], [2] have been proposed as a method to learn from observations in general graph structures and have found use in an ever growing number of applications [3], [4], [5], [6], [7], [8]. While GNNs make useful predictions, they typically act as black-boxes, and it has neither been directly possible (1) to extract novel insight from the learned model nor (2) to verify that the model has made the intended use of the graph structure, e.g. that it has avoided Clever Hans phenomena [9].
Explainable AI (XAI) is an emerging research area that aims to extract interpretable insights from trained ML models [10]. So far, research has focused, for example, on full black-box models [11], [12], or deep neural networks [13], where in both cases, the prediction can be attributed to the input features. For a GNN, however, the graph being received as input is deeply entangled with the model itself, hence requiring a more sophisticated approach.
In this paper, we propose a theoretically founded XAI method for explaining GNN predictions. The conceptual starting point of our method is the observation that the function implemented by the GNN is locally polynomial with the input graph. This function can therefore be analyzed using a higher-order Taylor expansion to arrive at an attribution of the GNN prediction on collections of edges, e.g. walks into the input graph. -Such an attribution scheme goes beyond existing XAI techniques for GNNs that are limited to identifying individual nodes or edges.
Furthermore, we find that the higher-order expansion can be expressed as a nesting of multiple first-order expansions, starting at the top layer of the GNN and moving towards the input layer. Specifically, we start with a node attribution task in the top layer and then grow these nodes into walks by recursively pursuing the expansion process w.r.t. quantities in the layers below. Practically, we can insert at every step any standard first-order explanation technique, in particular, the well-adopted Layer-wise Relevance Propagation (LRP) [13]. The resulting procedure that we propose and that we denote by GNN-LRP is shown in Figure 1.  Fig. 1. High-level illustration of GNN-LRP. The explanation procedure starts at the GNN output, and proceeds backwards to progressively uncover the walks that are relevant for the prediction. 2 GNN-LRP applies directly to a broad range of GNN architectures, without need to learn a surrogate function, nor to run any optimization procedure. We demonstrate GNN-LRP on a variety of GNN models from diverse application fields: (1) a sentiment prediction model receiving sentence parse trees as input, (2) a state-of-the-art GNN for quantum mechanically accurate prediction of electronic properties from molecular graphs, and (3) a widely adopted image classifier that we view as a GNN operating on pixel lattices.-For each GNN model, our explanation method produces detailed and reliable explanations of the decision strategy from which novel application insights can be obtained.

Related work
We focus here on the related work that most directly connects to our novel GNN explanation approach, in particular, (1) explanation techniques based on higher-order analysis, and (2) explanation techniques that are specialized for GNNs. For a more comprehensive set of related works, we refer the reader to the review papers [14], [15] for XAI and [2], [16] for GNNs.

Higher-Order Explanations
Second-order methods (e.g. based on the model's Hessian) have been proposed to attribute predictions to pairs of input features [17], [18], [19]. Another work [20] incorporates an explicit sum-of-interactions structure into the model, in order to obtain second-order or higher-order explanations. Another approach [21] detects higher-order feature interaction as an iterative algorithm which inspects neural network weights at the different layers. In the context of NLP, a special joint convolution-LSTM model was designed to extract n-ary relations between sentences [22].
Our work proposes instead to use the framework of Taylor expansions to arrive in a principled manner to the higher-order explanations, and it identifies GNNs as an important use case for such explanations.

Explaining Graph Neural Networks
The work [23] extends explanation techniques such as Grad-CAM or Excitation Backprop to the GNN model, and arrives at an attribution on nodes of the graph. In an NLP context, graph convolutional networks (GCNs) have been explained in terms of nodes and edges in the input graph using the LRP explanation method [24]. The 'GNNExplainer' [25], extracts the subgraph that maximizes the mutual information to the prediction for the original graph, and the identified subgraph gives the explanation. Further recently proposed methods that map the GNN prediction to graph substructures include XGNN [26], Trap2 [27] and GraphMask [28].
Our GNN-LRP method differs from these works by finding a scoring for sequences of edges (i.e. walks in the graph) instead of individual nodes or edges. This considerably enhances the informativeness of the produced explanations.

TOWARDS EXPLAINING GNNS
Graph neural networks (GNNs) [1], [2] are special types of neural networks that receive a graph as input. In practice, graphs can take a variety of forms, e.g. directed, undirected, labeled, unlabeled, spatial, time-evolving, etc. To handle the high heterogeneity of graph structures, many variants of GNNs have been developed (e.g. [29], [30], [31]). One commonality of most GNNs, however, is that the input graph is not located at the first layer, but occurs instead at multiple layers, by defining the connectivity of the network itself.
Graph neural networks are typically constructed by stacking several interaction blocks. Each block t = 1 . . . T computes a graph representation H t ∈ R n×dt where n is the number of nodes in the input graph and d t is the number of dimensions used to represent each node. Within a block, the representation is produced by applying (i) an aggregate step where each node receives information from the neighboring nodes, and (ii) a combine step that extracts new features for each node. These two steps (cf. [1]) connect the representations H t−1 and H t of consecutive blocks as: where Λ is the input graph given as a matrix of size n×n, e.g. the adjacency matrix to which we add self-connections. We denote by Z t,K the row of Z t associated to node K, and C t is a 'combine' function, typically a one-layer or multi-layer neural network, that produces the new representation for each node in the graph. The whole input-output relation implemented by the GNN can then be expressed as a function which is a recursive application of Eqs. (1) and (2) starting from some initial state H 0 ∈ R n×d0 , followed by a readout function g. The initial state typically incorporates information that is intrinsic to the nodes, or it can be set to constant values if no such information is present. The readout function is typically a classifier (or regressor) of the whole graph, but it can also be chosen to apply to subsets of nodes, for example, for node classification or link prediction tasks [32], [16].

First-Order Explanation
Consider first a 'classical' approach to explanation where we attribute the output of the neural network to variables in the first layer. In the case of the GNN, the first layer is given by the initial state H 0 . Let us now view the GNN as a function of the initial state, i.e. f (H 0 ). We will also denote by H 0,I the row of H 0 associated to node I. A Taylor expansion of the function f at some reference point H 0 gives: where '. . . ' represents the zero-, second-and higher-order terms that have not been expanded, and where the sum represents the first-order terms. Because the sum runs over all nodes I in the input graph, Eq. (4) readily provides an attribution of the GNN output to these nodes. It is arguable, however, whether this attribution can truly be interpreted as identifying node contributions. Indeed, the attribution may only reflect the importance of a node in the first layer, and not in the higher layers. Furthermore, an attribution of the prediction on nodes may not be sufficient for the application needs. For example, it does not tell us whether a node is important by itself, or if it is important because of its connections to other nodes or some more complex structure in the graph.
These limitations can be attributed to the fact that we have performed the decomposition w.r.t. the initial state H 0 instead of the 'true' input Λ.

Higher-Order Explanation
Consider now the true input Λ of the GNN. Since it occurs at multiple layers and because layers are generally composed in a multiplicative manner (cf. Eqs. (1) and (2)), a first-order analysis of the GNN function f (Λ) would not be suitable to identify the multiplicative interactions. These interactions can however be identified by applying a higher-order Taylor expansion.
In the following, we will use the additional notation λ E to denote the element of the matrix Λ associated to a particular edge E of the graph. Assuming that f (Λ) is smooth on the relevant input domain, we can compute at some reference point Λ, a T -order Taylor expansion: where we define α B ! := E α B,E ! with α B,E denoting the number of occurrences of edge E in the bag B. The sum runs over all bags B of T edges. Hence, the terms of the sum capture the joint effect of multiple edges on the GNN output. The last line '+ . . . ' represents the non-expanded terms of order lower or higher than T . Note that with this formulation, one still faces the difficult task of specifying a meaningful reference point Λ at which to perform the expansion and for which the Taylor expansion approximates the function well. In the following, we will leverage certain properties of the GNN model, to arrive at a simpler form of the Taylor expansion where the GNN output fully decomposes on bags of T edges. Proposition 1. Let f (Λ) have the structure of Eqs. (1)-(3), with C t and g piecewise linear and positively homogeneous with their respective inputs. If we perform a Taylor expansion of f (Λ) as in (5) at the reference point Λ = sΛ for an s > 0, then all terms of the expansion which are of higher or lower order than the network depth T vanish in the limit of s → 0, and we arrive at a decomposition f (Λ) = B R B with The proof of this proposition can be found in Appendix A in the Supplement. This attribution technique can also be seen as a higher-order generalization of 'Gradient × Input' [33] and 'Hessian × Product' [17].

EXPLAINING GNNS IN PRACTICE
The higher-order Taylor expansion presented above is simple and mathematically founded. However, systematically extracting higher-order derivatives of a neural network is difficult and does not scale to complex models.
To address this first limitation, we introduce the concept of a walk W, which we define to be an ordered sequence of edges which connect nodes in consecutive layers of the GNN. The relation between bag-of-edges and walks is illustrated for a simple graph in Fig. 2. Because each walk maps to a particular bag-of-edges, a walk-based explanation inherits all information contained in the bag-of-edges explanation. In particular, it is always possible to recover the bag-of-edges explanation from a walk-based explanation. However, using walks brings two further advantages: First, a walk-based explanation gives more information on way the multiple layers of the GNN have been used to arrive at the prediction. For example, as illustrated in Fig. 2, it can reveal whether message passing between two nodes has occurred in the first or in the last layers of the GNN. Second, the fact that the walks more directly connect to the structure of the GNN confers important practical benefits for computing the explanations. In the following, we will contribute two algorithms for explanation, where the relevant walks can be easily extracted using simple backward passes in the GNN.
To simplify the presentation of these algorithms, we introduce the new variable Λ ← (Λ, . . . , Λ) which distinguishes between edges occurring at different layers of the GNN. We then express the GNN output as a function of this expanded input, i.e. f (Λ ). We also adopt a nodebased notation, where walks are given by the sequence of nodes they traverse from the first layer to the top layer, e.g. W = (. . . , J, K, L, . . . ). The letters J, K, L denote nodes in consecutive layers, and '. . . ' acts as a placeholder for the leading and trailing nodes of the walk. We further denote by λ JK the element of Λ representing the connection between node J and node K.

The GNN-GI Baseline
Our first method, GNN-GI is based on the mathematical insight that Eq. (6) can be rewritten in a way that decomposes into walks. Additionally, the T -order derivative reduces to simple first-order derivatives defined locally in the GNN, and that are easy to compute using backward passes.

Proposition 2.
For the considered function f (Λ ) the higherorder terms in Eq. (6) can be equivalently computed as a sequence of differentations and multiplications by the terms of Λ forming each walk W = (. . . , J, K, L, . . . ): and then applying the pooling operation R B = W∈B R W .
A proof is given in Appendix B of the Supplement. Practically, GNN-GI operates as follows: Denote by K, L, M the last three nodes of the walk W, the algorithm starts by attributing on λ LM : where H T,M is the top-layer representation for node M . We observe that R LM is a function of H T −1,L , which itself depends on λ KL . Hence, the algorithm continues by taking R LM and attributing it on λ KL : To avoid writing all indices along the walk, we use the notation R KL... in place of R KLM , and then proceed further to compute R JKL... . We continue along the walk W until we reach the first layer where the final score R W is obtained. The procedure can be repeated for all walks in the graph to arrive at a complete explanation.-We call this algorithm GNN-GI, because it can interpreted as a nested Gradient × Input (GI) attribution, and consequently brings GI to GNNs. We will use GNN-GI as a baseline method in Section 4. GNN-GI has however several limitations in terms of explanation quality: When the network is not positive homogeneous (e.g. because its neurons have non-zero biases), the conservation property B R B = f (Λ) is no longer satisfied. A significant portion of the prediction might consequently fail to be attributed to the input variables. Furthermore, in deep models, the gradient on which the GI method relies is affected by a shattering effect [34], [35] which makes it noisy and less reliable. More sophisticated techniques are therefore needed.

Better Explanations with GNN-LRP
To overcome the limitations of our simple GNN-GI baseline, we will leverage the robustness and broader applicability of an existing explanation technique, Layer-wise Relevance Propagation (LRP) [13], [36], and adapt it for the purpose of explaining a GNN.
LRP is a well-adopted explanation technique that is specifically designed for deep neural networks (DNNs). LRP propagates the DNN output from the top layer to the input layer. At each layer, a propagation rule is applied. The propagation rule has hyperparameters that can be set to induce certain desirable properties of the explanation, such as robustness. Compared to GI, LRP deals better with the strong nonlinearities of the model (e.g. the shattering effect [34]). GI can in fact be seen as a special case of LRP where the hyperparameters for robustness are set to zero [33], [37]. Furthermore, when the neural network has negative biases, e.g. to build sparse representations, LRP also conserves the prediction better than GI [37].
Motivated by the advantageous properties of LRP, we propose GNN-LRP: an improvement of GNN-GI that substitutes in Eq. (7) the GI attribution steps by LRP attribution steps. The new nested attribution procedure is given by: , . . . , (8) where the function LRP(·, ·) attributes the relevance given as a first argument to the connections given as a second argument. Note that LRP works at the neuron level instead of the node level. Hence, we use the bold notation in Eq. (8) to signify that the relevance arriving at a given node is a collection of relevances arriving at each neuron of the node, and that a connection between two nodes is a collection of connections between pairs of neurons from the two nodes. The nested LRP procedure is illustrated in Fig. 3. While GNN-LRP does not correspond to a particular analytical form, the approach can still be theoretically justified using the same mathematical tool that was used to justify LRP: Deep Taylor Decomposition (DTD) [41]. DTD views LRP as performing a multitude of simple Taylor expansions in each layer of the deep network-one per neuron. DTD relies on an inductive principle that we can formulate in the context of GNNs in three steps. Denote by R kL... the relevance arriving in neuron k, and let h k be the corresponding neuron activation.
Step 1 Assume that we can write R kL... = h k c kL... where c kL... is a locally approximately constant term.
is treated as constant, and apply LRP to produce the relevance scores R jkL... . Step 3 Apply the pooling R jKL... = k∈K R jkL... and verify that the result can be written as h j c jKL... with c jKL... locally approximately constant.
If this induction holds, this justifies the application of LRP through the whole GNN from the top-layer to the first layer. Appendix C of the Supplement gives a concrete application of these induction steps for the GCN model. Practical GNN-LRP propagation rules for three popular GNN architectures are provided in Table 1.  TABLE 1 Practical GNN-LRP propagation rules for different types of GNNs. We designate by w jk the element of the matrix Wt that links neuron j to neuron k. The function ρ(·) is the rectification function max(0, ·), and we make use of the notation (·) ∧ = (·) + γρ(·), where γ is a hyperparameter of the propagation rule. In the second row, δ jk is an indicator function (Dirac delta) that is 1 when j and k are neurons at the same location within their respective node. In the last row, Λs represents one component of the graph convolutional filter approximation presented in [38].
Spectral [38], [40] Our GNN-LRP method is applicable to a broad range of existing GNN models, in particular, the original GNN model [1], GCN [29], GraphSAGE with mean aggregation [42], Neural FP [43], GIN [39], any spectral filtering method [44] such as the Spectral Network [38] or ChebNet [40]. GNN-LRP is also applicable to other recent GNN architectures such as the SchNet [30] used for predicting molecular properties, and where the graph is a representation of the distance between atoms. GNN-LRP is also applicable to convolutional neural networks for computer vision such as VGG-16 [45], which can been seen as a particular GNN receiving as input a pixel lattice. Because GNN-LRP is tightly embedded in the more general LRP/DTD framework, it can be extended to more advanced architectures such as joint CNN-GNN models for spatio-temporal graphs [31]. Furthermore, GNN-LRP allows in principle to use different edges and nodes at different layers, which can be useful to handle graph pooling structures, such as those described in [46]. Table 1 fully specify the GNN-LRP procedure, the latter can be more conveniently and concisely implemented using a 'forward-rewrite' strategy originally proposed for standard LRP (cf. [14]). In the following, we adapt this strategy to GNN-LRP.

Implementing GNN-LRP While equations in
Our demonstration focuses on the GCN case (Eq. (9)), but the approach can be extended to other GNN models. We first observe that by rewriting relevance scores as a product of the corresponding activation and some factor, i.e. R jKL... = h j c jKL... and R kL... = h k c kL... , Eq. (9) can be equivalently formulated as: with p k = J j∈J λ JK h j w ∧ jk . This formulation can be further transformed to let appear a structure similar (but not equivalent) to gradient propagation: Specifically, compared to the standard gradient propagation equation given by δ j = K k∈K (∂h k /∂h j )δ k , our LRPequivalent equation differs by (i) computing the derivative of p k instead of h k , (ii) multiplying by an extra factor h k /p k , and (iii) not summing over nodes K.
Having connected GNN-LRP to gradient propagation, we now rewrite the forward pass of the GNN in a way that its output remains the same, but where the output of the gradient computation-assumed to be computed by automatic differentiation-matches Eq. (13). Such functionality can be achieved by 'detaching' certain parts of the forward computation from the differentiation graph. Specifically, we redefine the combine step of Eq. (9) as: where and denote the element-wise multiplication and division respectively, where M K is a mask array that is one for neurons associated to node K and zero elsewhere, and where [·] cst. indicates portions of the computation that have been detached.
Furthermore, once the forward pass is rewritten at each layer, with masks chosen in a way that it selects for some walk W of interest, we can compute the relevance score for W as where the automatic differentiation mechanism is now fully repurposed to perform the sequence of LRP computations. The procedure must then be repeated for every walk in the graph in order to produce a full explanation. Note that this procedure is flexible and can support many variations. For example, if we wish to ignore the node at which the walk passes at a certain layer of the GNN, the mask at that layer can be simply removed. In effect, removing this mask is equivalent to summing the relevance score of walks passing through all nodes at that layer.
Also, because computing a full explanation requires as many forward/backward steps as there are walks in the input graph, various strategies can be considered to speed up computations: For example, we can adopt a multiresolution approach where relevant walks are first defined at a coarse level between super-nodes (masks M are then set to match these super-nodes), and the most relevant 'superwalks' are then expanded into walks between actual nodes. Alternately, the GNN-LRP computation can be broken down layer-wise, and walk computations can be stopped early if the relevance score R JKL... is below a certain threshold. 6 Finally, GNN-LRP can also be made faster when the input graphs have a particular structure. In particular, for the experiments of Section 5.3, we will leverage the local connectivity of the VGG-16 network to compute many walks in parallel, which dramatically speeds up computations. Details of this parallel walk computation approach are presented in Appendix D of the Supplement.

TESTING AND VALIDATING GNN-LRP
To test the proposed GNN-LRP method, we train various types of GNNs on several graph prediction tasks. We first consider a two-class synthetic problem where the first class consists of Barabási-Albert graphs [47] of growth parameter 1 (BA-1), and where the second class has a slightly higher growth model and new nodes attached preferably to lowdegree nodes. Examples of graphs from the two classes are given in Fig. 4 (left). Details on this synthetic dataset are given in Appendix E in the Supplement.
We consider first a graph isomorphism network (GIN) that we train on this task. Our GIN has two interaction layers. In each interaction layer, the combine function consists of a two-layer network with 32 neurons per node at each layer. The initial state H 0 is an all-ones matrix of size n × 1, i.e. nodes do not have intrinsic information. The GIN receives as input the connectivity matrix Λ = A/2 where A is the adjacency matrix augmented with self-connections. The GIN is trained on this task until convergence, where it reaches an accuracy above 95%. After training, we take some exemplary input graph from the class BA-1, predict it with our GIN, and apply GNN-LRP on the prediction. We use the LRP parameter γ = 2 and γ = 1 in the first and second interaction layers respectively. The resulting explanation is shown in Fig. 4 (right). The explanation produced by GNN-LRP reveals that walks that traverse or stay in the high-degree node are the principal contributors to the GIN prediction. On the other hand leaf nodes or sequences of low-degree nodes are found Pope et al. [23] (GI / LRP) GNN-GI GNN-LRP GNNExplainer [25] Fig. 5. Comparison of different explanation techniques on the same graph as in Fig. 4. GNN-LRP produces more detailed explanations compared to [23] and [25], and brings more robustness compared to GNN-GI.
to be either irrelevant or to be in slight contradiction with prediction.
In the following, we compare GNN-LRP to a collection of other GNN explanation methods: (a) Pope et al. [23]: The method views the GNN as a function of the initial state H 0 , and performs an attribution of the GNN output on nodes as represented in H 0 . In principle, the proposed framework lets the user choose the technique to perform attribution on H 0 . In our benchmark, we use the techniques GI and LRP. (b) GNN-GI: This is the simple baseline we have contributed in Section 3.1. This baseline can also be seen as a special case of GNN-LRP with γ = 0. (c) GNNExplainer [25]: The method runs an optimization problem that finds a selection of edges that maximize the model output. The procedure can be viewed as finding a mask M = σ(R) where σ denotes the logistic sigmoid function, that maximize the prediction f (M Λ). The explanation is then given by R. Explanations produced by each method are shown in Fig.  5. The method by Pope et al. [23] highlights nodes that are relevant for the prediction. However, it is difficult to determine from the explanation whether the highlighted nodes are relevant by themselves or if they are relevant in relation to their neighbors. The GNN-GI baseline we have contributed, and our more advanced method GNN-LRP, provide a much higher level of granularity: They distinguish in particular between what has to be attributed to the node itself and what has to be attributed to collections of multiple connected nodes. The GNNExplainer [25] produces explanations that are in agreement with GNN-LRP but sparser and less detailed.
Even when the user aims specifically for a simple explanation, GNN-LRP remains a method of choice: Relevance scores (R W ) W can be reassigned from walks to more descriptive structures, such as leaves, trees, cycles, etc. Relevant walks can also be reassigned to more basic elements such as nodes or edges in order to produce a level of complexity similar to Pope et al. [23] or GNNExplainer [25]. In fact, pooling GNN-LRP and GNN-GI explanations 7 based on the first node of the walk results in node-based explanations that are equivalent to those obtained with Pope et al. [23].
Orthogonal to the level of detail of the explanation, methods based on GI tend to be less selective than those based on LRP. We observe for GNN-GI that several regions of the graph receive an excessive amount of positive or negative relevance. The noise caused by GI becomes particularly strong and problematic on larger and deeper models such as the VGG-16 network (cf. Fig. 13 in Section 5.3).
Overall, GNN-LRP is the only method in our benchmark that produces explanations that have both the desired robustness and a high level of detail.

Quantitative Comparison
To verify that GNN-LRP produces explanations that are systematically superior to those produced by other explanation methods, we will perform a quantitative evaluation. A common evaluation technique that was introduced in the context of image classifiers is pixel-flipping [48]. The procedure consists of 'flipping' pixels by increasing or decreasing order of relevance to verify that removing relevant/irrelevant pixels causes a strong/weak variation at the output of the model. If that is the case, it can be concluded that the explanation faithfully describes the prediction at the output of the model.
We adapt the pixel-flipping method to graph data by considering nodes of the graph instead of pixels of an image as a flipping unit. We refer to this adaptation as 'nodeflipping'. In our benchmark, we will leverage the higherorder information contained in some of the explanations to determine more precisely which nodes need to be flipped to incur the desired effect on the GNN output. Specifically, based on the explanation, we wish to estimate for any subgraph G the contribution R G of that subgraph to the prediction, so that the effect on the GNN output of adding/removing nodes can be expressed as differences For a node-based attribution (e.g. [23]), the relevance to be attributed to a particular subgraph G is simply given as the sum of contributions of each node: For an edge-based attribution (e.g. GNNExplainer [25]), the relevance of the subgraph is more finely given as a sum over the edges E forming the subgraph: For higher-order attribution methods whose unit of explanation are bags of edges (computable as R B = W∈B R W for GNN-LRP and GNN-GI), the estimation of the subgraph relevance is further refined by computing where the membership relation B ∈ G requires that all edges in the bag B are also in the subgraph G. We note that in every case, the estimator R G is zero when considering the empty graph, and it corresponds instead to the GNN output when considering the full graph.
We consider two node-flipping tasks. First, an activation task, where we would like to add nodes to the graph so as to activate the GNN output maximally and as quickly as possible. In the general case, finding the optimal sequence of nodes to add is combinatorially complex. However we can use instead a greedy approach, where for a given subgraph G, the best node to add next according to the explanation is given by: Then, we consider a pruning task, where we start with the original graph G 0 and want to remove nodes from the graph in a way that the GNN output is minimally affected. In that case, the best node to remove is Note that with this greedy strategy, higher-order techniques such as GNN-GI and GNN-LRP may have their performance underestimated, as adding one node at a time prevents from uncovering new parts of the graph composed of mutually relevant nodes. To mitigate this effect we can substitute at regular intervals of the node-flipping procedure the greedy step by a coarser step where the node is added based on its marginal score R V .
Once the sequence of nodes to flip has been built, we produce a plot that keeps track of the GNN output throughout the flipping process. The procedure can be repeated for a large number of graphs, leading to an averaged version of the same plot. The latter can be further summarized with the 'Area Under the Flipping Curve' (AUFC), which we define as the mean of plotted values over the flipping sequence. The node-flipping procedure and the AUFC computation are depicted in Fig. 6. We will now use the node-flipping procedure to evaluate the different explanation methods on a broad set of architectures and tasks. On the Synthetic Data, we consider in addition to the GIN model used in the qualitative experiments, a GCN and a spectral network. Our GCN, has 128 neurons per node at each layer. For the spectral network, we use 32 neurons per node at each layer and give as input the power expansion Λ = [ A 0 , 1 2 A 1 , 1 4 A 2 ]. Both networks perform similarly to the GIN in terms of classification accuracy.
In addition, we also consider networks trained on real data: A first one is designed and trained by ourselves on a Sentiment Analysis task where the graph represents a syntactic tree with nodes carrying information about each word. Another one is the SchNet [30] model used for molecular prediction where nodes and edges represent atom types and distances respectively. Finally, we use a pre-trained VGG-16 [45] network for image recognition that we interpret as a graph neural network operating on a lattice of size 14 × 14 and starting at convolutional block 3. In this last network, each node represents the collection of activation at a specific spatial location. Details for each network are given in Appendix F in the Supplement. Results for each network, explanation method and task are summarized in Tables 2  and 3.  On the Synthetic Data, we observe that GNN-LRP systematically outperforms other methods, both on the activation and pruning tasks. Nearest competitors are Pope et al. [23] (in combination with LRP), GNN-GI, and the GNNExplainer [25]. This corroborates our qualitative analysis at the beginning of Section 4. On the Sentiment Analysis task, GNN-LRP and GNNExplainer [25] perform best for the activation and pruning task, followed by GNN-GI and both node attribution methods by Pope et al. [23]. In natural language, the sentiment associated to a sentence relies on word combinations, e.g. negation. This can explain why methods which attribute interactions of words such as GNN-LRP and GNNExplainer perform better.
For the experiments on the SchNet, our method performs again above competitors. Note that in this experiment, use of GI and LRP are equivalent, because the lack of ReLU-Dense-ReLU structures forces us to use the LRP-0 rule, instead of the more robust LRP-γ, and application of LRP-0 is equivalent to GI (cf. [33], [37]). The difference to other competitors (Pope et al. [23] and GNNExplainer [25]) is small on the prediction of the energy E but larger for the dipole-moment µ, possibly because of a more complex structure of the prediction task involving longer interactions.
Finally, on the VGG-16 image recognition model, GNN-LRP performs best on both tasks. Here, the superiority of LRP vs. GI can be explained both by a better handling of neuron biases and by a higher robustness to gradient shattering, a key difficulty to account for when explaining very deep models [34], [35].
Overall, we find that GNN-LRP is systematically the best method in our benchmark. With the fine-grained yet robust explanations it provides, GNN-LRP is capable of precisely and contextually identifying elements of the graph that contribute the most or the least to the prediction.

NEW INSIGHTS WITH GNN-LRP
Having validated the proposed GNN-LRP method on a diverse set of GNNs including state-of-the-art models, and having shown the multiple advantages of our method compared to previous approaches, we will now inspect the explanations produced by GNN-LRP on some of these practically relevant GNN models to demonstrate how useful insights can be extracted about the GNN model and the task it predicts.

Sentiment Analysis
In natural language processing (NLP) text data can be processed either as a sequence, or with its corresponding grammatical structure represented by a parse tree [49], [50]. The latter serves as an additional structural input for the learning algorithm, to incorporate dependencies between words. NLP tasks are therefore particularly amenable to GNNs since these models can naturally incorporate the graph structure.
In the following experiments, we will demonstrate how GNN-LRP can be used to intuitively and systematically assess the quality of a GNN model, including its overall prediction strength and also its few weaknesses. For this, we will consider a GCN composed of two interaction layers, to classify sentiments in natural language text [51]. We train our model on the Stanford Sentiment Treebank (SST) [52] 1 . For details on the experimental setup we refer the reader to Appendix F.2 in the Supplement.
1. Note that this example serves as a mere demonstration for the versatility of our explanation approach and is by no means intended to reflect or compete with state-of-the-art NLP systems.
In Fig. 7 (top) we show an example of a GNN-LRP explanation for some exemplary input sentence containing a mixture of positive and negative sentiment. We observe that distinct combinations of words give rise to the emphasis of these sentiments. The model correctly detects word combinations such as "the best movies" to carry a positive sentiment, and "boring pictures" to contribute negatively. GNN-LRP can also be used to assess the GNN prediction strategy relative to other (simpler) models, such as Bagof-Words (BoW). The BoW model can be seen as a GNN model with zero interaction layers, hence our explanation technique applies to that model as well. Fig. 7 (bottom) shows the explanation of the BoW prediction for the same sentence as for the GNN. Interestingly, several words are now attributed a sentiment different from the one obtained with the GNN. For example, "like" becomes positive, which however appears in contradiction with the preceding words "didn't". Hence, GNN-LRP has highlighted from a single sentence that the GNN model is able to properly capture and disambiguate the sentiment of consecutive words, whereas the BoW model is not.

GNN
In the next experiment, we consider two additional sentences from the SST corpus, where GNN-LRP contributes to uncovering or better understanding flaws of a trained GNN model. Fig. 8 A shows a data sample that contains sarcasm and that is falsely classified by the GNN to be positive. Our explanation method highlights that relevant walks are too localized to capture the interaction between words that jointly explain sarcasm. Instead, local positive interactions such as "more entertained" dominate the prediction, which leads to the incorrect prediction. In Fig. 8 B, we see a case of entity bias where the GNN model is biased towards particular entities, namely "Hugh Grant" and "Sandra Bullock". In this example, GNN-LRP finds that the GNN model uses with no objective reason "Hugh Grant" and "Sandra Bullock" as evidence for positive and negative sentiment, respectively. In NLP model biases are well studied areas and some approaches to tackle that problem have already been developed [53], [54]. To identify words or combinations of words that systematically contribute to a positive or negative sentimentand potentially discover further cases of entity bias,-we apply GNN-LRP on the whole dataset. This lets us find the combination of words (given by a walk W) that are on average the most positive/negative (according to their score R W ). Fig. 9 shows top-3 walks of both types (positive and negative) and containing one to three unique words.  We see that the walks which are most relevant for the task contain positive adjectives and adverbs, such as "solidly", "brilliant", "wonderfully" or "entertaining". The walks with a very negative relevance score contain negative words such as "moot", "gimmicky" or "vulgar", but also subsequences like "charlie sorry", "dead weight" or "no more." which clearly transport a negative emotion. Here again, GNN-LRP detects an entity bias by the word "lewis", which the GNN model considers to be positively contributing although this word is objectively neutral. Note that this time, this entity bias was discovered directly, without having to visualize a large number of explanations.
Overall, applying the proposed GNN-LRP explanation method to the GNN model for sentiment classification has highlighted that GNN predictions are based on detecting meaningful sentence sub-structures, rather than single words as in the BoW model. Furthermore, GNN-LRP was able to find the reasons for incorrect predictions, or to shed light on potential model biases. The latter could be identified manually by visual inspection of many explanations, or systematically by averaging the GNN-LRP results on a whole corpus.

Quantum Chemistry
In the field of machine learning for quantum chemistry [55], [56], [57], GNNs have been exhibiting state of the art performance for predicting molecular properties [58], [59], [30], [3]. Such networks incorporate a graph structure of molecules either by the covalent bonds or the proximity of atoms.
In this section we will test the ability of GNN-LRP to extract meaningful domain knowledge from these state-ofthe-art GNNs. We will consider for this the SchNet, a GNN for the prediction of molecular properties [30], [3], [60], and we set the number of interaction blocks in this GNN to three. We train the model on the atomization energies and the dipole moments for 110,000 randomly selected molecules in the QM9 dataset [61]. For more details on the model parameters and the network architecture, we refer to Appendix F.3 in the Supplement. On a test set of 13,885 molecules, the atomization energy and dipole moment are predicted well with a mean absolute error (MAE) of 0.015 eV and 0.039 Debye, respectively.
Our first objective is to get an insight into what structures in the molecule contribute positively or negatively to the molecule's energy. While it is common to look at the atomization energy (describing the energy difference to dissociated atoms), we consider here for the purpose of explanation the centered negative atomization energy, and we define this quantity to be the actual 'energy'. With this definition, molecules have high energy when they are hard to break and typically formed of strong bonds, and conversely, molecules have low energy when they are easy to break and unstable.
We consider for illustration the case of the paracetamol (acetaminophen) molecule, and feed this molecule to SchNet. Once the SchNet model has predicted its energy, we apply the GNN-LRP analysis in order to produce an explanation of the prediction. The resulting explanation is shown in Fig. 10 (left). We observe that the explanation is dominated by self-walks (i.e. staying in a single atom) or one-edge walks (traversing a single edge of the graph, in most cases, a bond). Bonds associated to the aromatic ring and bonds of higher order contribute strongly to the predicted energy, whereas regions involving single bonds contribute negatively. To verify whether this observation generalizes to other molecules, we perform the GNN-LRP analysis on a set of 1000 molecules randomly drawn from the QM9 dataset and show in Fig. 10 (right) the average bond contribution for each bond type. We observe an increasing energy contribution with ascending bond order. This coincides with chemical intuition that bonds of higher order are more stable and, thus, require more energy to break.-Note that the SchNet does not take bond types as an input, but as highlighted by GNN-LRP, it has clearly inferred these chemical features from the data. We now turn to another quantum chemical property, the dipole moment, and its prediction by SchNet (cf. [62]). The quantity produced at the output of the model has the form µ(Λ) . The nonlinearity of the norm introduces higher-order terms in the GNN function and this prevents a direct application of GNN-LRP. Instead, we consider for explanation the dot product µ(Λ), µ[Λ ]/ µ[Λ ] , where the left hand side functionally depends on the GNN input Λ, and where the right hand side is the direction of the predicted dipole moment for the actual molecule denoted by Λ . With this modification, the top layer becomes linear, and GNN-LRP can proceed as for the energy prediction case. Fig. 11 (top) shows the GNN-LRP explanation of the predicted dipole moment for the same molecule as in the previous experiment. Here, we observe that the contributions to the dipole moment found by GNN-LRP form a gradient from one side to the other side of the molecule. The orientation corresponds to the positive and negative pole of the molecule. To verify whether this insight generalizes to other molecules, we consider a set of 1000 molecules and we normalize each molecule to a span of 1 along its dipole direction. Subsequently, we project all walks onto their respective dipole to obtain a one-dimensional distribution of absolute relevance values for all molecules. Note that the atom density of molecules, in general, is not homogeneous, and thus, in some regions more walks may occur while other regions do not exhibit as many walks. Hence, for each molecule the distribution of absolute relevance is normalized w.r.t. its atom density. The result of this aggregated analysis is shown in Fig. 11 (bottom).
This quantitative result confirms the alignment of GNN-LRP contributions with the positive and negative poles of the molecule, as it was found qualitatively on the paracetamol molecule. This result is in accordance with chemical intuition regarding the dependence of the dipole on the span of the molecule.
Because the walk-based explanations produced by GNN-LRP are very detailed, some of the more intricate details of the explanation cannot always be visualized on a single molecule. Hence, we perform a further experiment where the GNN-LRP explanation is spread over multiple visuals, each of them showing relevant walks covering a specific number of edges. With this expanded visualization, we seek in particular to better distinguish between local atom-wise contribution and more global effects. Figure 12 shows this expanded analysis of the dipole moment predic-  From this visualization, we gain further insights into the strategy used by the SchNet model for predicting. In our expanded explanation, one-edge walks clearly indicate the electrostatic poles of the molecule, while giving a hint on local dipoles. Self-walks (i.e. 0-edge walks) incorporate elements that are inherent to the atom types, in particular, their electronegativity. Two-edge walks provide rather complex and spatially less resolved contributions. Finally, three-edge walks, that are also the most global descriptors, again provide interesting spatial contributions, and appear to dampen the one-edge walks contributions based on the more complex structures they are able to capture.
Overall, GNN-LRP has provided insights into the structure-property relationship of molecules that reach beyond the original prediction task. The resulting relevant walks agree with chemical characteristics of the molecule, thus, indicating that the neural network has indeed learned chemically plausible regularities.

Revisiting Image Classification
A convolutional neural network (CNN) can be seen as a particular graph neural network (GNN) operating on lattices of pixels. CNN predictions have so far mainly been explained using heatmaps highlighting pixels that are the most relevant for a given prediction [63], [13], [12]. Heatmaps are a useful representation summary of the decision structure, but they do not reveal the more complex strategies of a network that have been used to progressively build the prediction layer after layer. We will show by viewing CNNs as graph neural networks and extracting relevant walks in the resulting pixel lattice, that our GNN-LRP method is capable of shedding light into these strategies.
For this, we consider the well-established VGG-16 [45] network. It consists of a collection of blocks interleaved by pooling layers, where each block is composed of a sequence of convolution and ReLU layers. We use the pretrained version of the VGG-16 network [45] without batchnormalization, which can be retrieved using the TorchVision module of PyTorch.
Because the VGG-16 neural network is deep and the number of possible walks grows exponentially with neural network depth, we marginalize explanations to only consider the position of the walk at the input and at the output of a block. This is easily achieved by removing all masks except those at the input and output of the block (cf. Section 3.2.1). We then compute one explanation for block 3, 4, and 5. Also, to cope with the large spatial lattices in each block, we make use of the multi-mask strategy outlined in Appendix D of the Supplement. Specifically, because each block of the VGG-16 network has receptive fields of size 7, we can process multiple walks at the same time by choosing the mask to be a grid with stride 7. This allows us to collect all relevant walks at the given block in the order of 49 backward passes.
We consider two exemplary images 2 that the VGG-16 network respectively predicts as 'teapot' and 'dumbbell'. We set the LRP parameter to γ = 0.5 in block 3, halving the parameter value in each subsequent block, and choosing γ = 0 in the top-level classifier. Fig. 13 (left) shows the result of the analysis for these two images at various blocks of the VGG-16 network.
For the first image, Block 3 detects local edges in the teapot, then, in Block 4, the walks converge to center points of specific parts of the teapot (e.g. the handle, the spout and the knob), and finally the walks converge in Block 5 to the center of the teapot, which can be interpreted as composing the different parts of the teapot. For this exemplary image, we further observe in Fig. 13 (right) the advantageous properties of GNN-LRP compared to more basic explanation methods. The GNN-GI baseline also produces a vector field, however, the latter is significantly more noisy than the one 2. Images are from https://www.piqsels.com/en/ public-domain-photo-fjjsr and https://www.piqsels.com/en/ public-domain-photo-fiffy, rescaled and cropped to the relevant region to produce images of size 224 × 224 which are the standard input size for VGG-16. produced by GNN-LRP. The method by Pope et al. [23] robustly highlights relevant nodes at the input of the given block, however, it does not reveal where these features are being transported for use in the subsequent block.
For the second image, we investigate a known 'Clever Hans' strategy where the network classifies images as 'dumbbell' by detecting both the dumbbell and the arm that holds it [64]. Using GNN-LRP we observe that Blocks 3 and 4 detect the arm and the dumbbell separately, and then, Block 5 composes them into a single 'dumbbell-arm' concept, as shown by the walks for both objects converging to some center point near the wrist. Clearly these insights could not have been obtained from a standard pixel-wise heatmap explanation.
Overall, our GNN-LRP method can be used to comprehensively inspect the prediction of an image classifier beyond what would be possible with a standard pixel-wise heatmap explanation. This deeper explanation capability allows us to better understand the detailed structure of image classifications, and also to shed more light into anecdotal 'Clever Hans' effects observed in the context of state-of-theart image classifiers.

CONCLUSION
Graph neural networks are a highly promising approach for predicting graphs, with a strong demand from the practical side. For these models to be broadly adopted, it is however important that their predictions are made explainable to the user.-Because the input of a GNN is tightly entangled with the model itself, the explanation problem is particularly difficult, and, methods for explaining GNNs have so far been limited.
In this paper we have proposed a novel theoretically principled approach to produce these explanations, based higher-order Taylor expansions. From this conceptual starting point, we have then contributed two practical algorithms: GNN-GI which we propose as a simple baseline, and GNN-LRP which is more robust and scales to highly non-linear models. GNN-LRP produces detailed explanations that subsume the complex nested interaction between the GNN model and the input graph. It also significantly outperforms other explanation methods in our quantitative benchmark.-In addition to the high quality of the explanations it produces, GNN-LRP is also broadly applicable (covering the GCN, the GIN, and spectral filtering approaches), and it can handle virtually any type of input graph, whether it is a parse tree, a spatial graph, a pixel lattice, etc.
This broad applicability is demonstrated in our extensive application showcase, including sentiment analysis, quantum chemistry and image classification. In each scenario, GNN-LRP could highlight the diverse strategies employed by the GNN models, and unmask some undesired 'Clever Hans' strategies. In our quantum-chemical application showcase, we could additionally extract interesting problem-relevant insights. Future work will apply the proposed methodology to analyze properties of materials for practically highly relevant tasks, e.g. in catalysis. In this Supplementary Material, we provide the proofs for Propositions 1 and 2 of the main paper, on which our method is built. We also provide a detailed justification of the GNN-LRP procedure for the GCN. Finally, we give more details on our synthetic dataset and on the graph neural networks used in the experiments section of the main paper.
Proof. We know that function C t is piecewise linear, and from Eq. (1) in the main paper we know that the aggregation step in each layer t is linear with respect to Λ. Applying the aggregation and combine step T times with respect to the same Λ, implies that the last hidden layer H T (Λ; H 0 ) is piecewise a multivariate polynomial of order T in Λ. Piecewise linearity of the readout function g implies that f (Λ) is piecewise multivariate polynomial of order T as well.
We also assumed positive homogeneity in the combine and readout function. With a T -times layer-wise application of the aggregation function with respect to Λ, and the combine function C t , we get that H T (Λ; H 0 ) is positive homogeneous of order T in Λ. Positive homogeneity of the readout function g implies that f (Λ) is also positive homogeneous of order T , which means formally for all s > 0.
From the property of positive homogeneity we can deduce two additional important properties. First, f (Λ) is not only piecewise a polynomial, but piecewise a homogeneous polynomial of order T . This means that for every input matrix Λ, there is a piece ι ⊂ R n×n with Λ ∈ ι for which we can find a set of coefficients b B ∈ R, such that the function can be written on that piece as a homogeneous polynomial of order T : where B T is the set of all bag-of-edges of length T . Second, the piece ι can always be chosen large enough so that it contains the whole line sΛ with s > 0, i.e. (sΛ) s>0 ⊂ ι, which includes the input Λ but also the reference point Λ. This enables us to restrict our investigation to that particular piece ι, specifically, we restrict in the following the function to be f : ι → R with f ← f |ι , and leverage the fact that it is a homogeneous polynomial of order T . We now proceed with the main step of Proposition 1 and apply the Taylor theorem. The general Taylor decomposition of f (Λ) at any reference point Λ is given by where α B is the multindex of the bag B, i.e. α B,E := |{E ∈ B}| with α B ! = E α B,E !. We first want to show that the addends on the r.h.s. with respect to k vanish, for k = T , if Λ = sΛ and s → 0. We distinguish between two cases.
Case k < T From Eq. (3) we deduce that ∂ k f (∂λ E1 . . . ∂λ E k ) is a homogeneous polynomial of order T − k. Which directly implies Case k > T We know that f (Λ) is a homogeneous polynomial of degree T . If we consider the derivative of f with a higher order than T , it must be zero, hence ∂ k f (∂λ E1 . . . ∂λ E k ) = 0.
Now, together with the fact that ∂ k f (∂λ E1 . . . ∂λ E k ) is constant with respect to Λ for k = T , we obtain for s → 0 which is what we wanted to show.

APPENDIX B PROOF OF PROPOSITION 2
Proposition 2. For the considered function f (Λ ) the higher-order terms in Eq. (1) can be equivalently computed as a sequence of differentations and multiplications by the terms of Λ forming each walk W = (. . . , J, K, L, . . . ): and then applying the pooling operation R B = W∈B R W .
Proof. First we recall, that with Λ ← (Λ, . . . , Λ) we distinguish between the connectivity matrices in every interaction block. In our proof we want to index the connectivity matrix in every interaction block and therefore extend the notation to Λ = (λ (t) E ) E,t where t = 1, . . . , T identifies the interaction block and E identifies the edge (or node pair). Then for any given bag-of-edges B which contains the edges sequence E 1 , . . . , E T , we note where α B is the multi-index which counts the multiplicity of edges in B, i.e. α B,E := |{E ∈ B}| for a node pair E, and α B ! := E α B,E !. The three steps we have performed are detailed below: Directional Derivative Expansion: For the step between Eq. (6) and (7), we used that the derivatives of f in one component λ E is equal to the directional derivative of f at the vector (λ (t) E ) t in the direction of the vector which is constant one, i.e. 1 ∈ R T . From the theory of directional derivatives we know that the directional derivative is equal to the scalar product between the gradient and its direction, since the direction is in our case just 1, the derivative of ∂f /∂λ E is equal to t ∂f /∂λ (t) E . Applying this to every edge E t in B leads to Eq. (7). Reduction to Unique Layer For the step from Eq. (7) to (8) we used the fact that ∂ T f (∂λ is only non-zero if all t 1 , . . . , t T differ, because f (Λ ) is piecewise linear in the connectivity matrix λ (t) and if we compute the derivative of f with respect to the same interaction block multiple times, it will vanish. This reduces the sum onto addends with partial derivatives of the form ∂ T f (∂λ  E T are only non-zero if all λ E1 , . . . λ E T are non-zero, since λ (t) is always equal to Λ for each t. The latter condition can also be interpreted as E 1 , . . . , E T forming a walk on Λ.
To pursue the derivation, we will slightly change the notation, and represent the walks in the upcoming equations as a sequence of nodes rather than edges, i.e. W = (. . . , J, K, L, . . .). With this we get