Graph Signal Restoration Using Nested Deep Algorithm Unrolling

Graph signal processing is a ubiquitous task in many applications such as sensor, social, transportation and brain networks, point cloud processing, and graph neural networks. Often, graph signals are corrupted in the sensing process, thus requiring restoration. In this paper, we propose two graph signal restoration methods based on deep algorithm unrolling (DAU). First, we present a graph signal denoiser by unrolling iterations of the alternating direction method of multiplier (ADMM). We then suggest a general restoration method for linear degradation by unrolling iterations of Plug-and-Play ADMM (PnP-ADMM). In the second approach, the unrolled ADMM-based denoiser is incorporated as a submodule, leading to a nested DAU structure. The parameters in the proposed denoising/restoration methods are trainable in an end-to-end manner. Our approach is interpretable and keeps the number of parameters small since we only tune graph-independent regularization parameters. We overcome two main challenges in existing graph signal restoration methods: 1) limited performance of convex optimization algorithms due to fixed parameters which are often determined manually. 2) large number of parameters of graph neural networks that result in difficulty of training. Several experiments for graph signal denoising and interpolation are performed on synthetic and real-world data. The proposed methods show performance improvements over several existing techniques in terms of root mean squared error in both tasks.


I. INTRODUCTION
Signal restoration is a ubiquitous task in many applications. Depending on the types of signals, the interconnectivity among samples can often be exploited, for example, signals residing on sensor networks, social networks, transportation networks, and brain networks, power grids, 3D meshes, and point clouds, all have various connectivities which can often be represented as graphs.
Preliminary results of this work was presented in [1]. M. Nagahama and Y. Tanaka are with the Department of Electrical Engineering and Computer Science, Tokyo University of Agriculture and Technology, Koganei, Tokyo 184-8588, Japan. Y. Tanaka is also with PRESTO, Japan Science and Technology Agency, Kawaguchi, Saitama 332-0012, Japan (email: nagahama@msp-lab.org; ytnk@cc.tuat.ac.jp).
K A graph signal is defined as a signal whose domain is the nodes of the graph. The relations between the samples, i.e., nodes, are given by the edges. In contrast to standard signals on a regular grid such as audio and image signals, graph signal processing (GSP) explicitly exploits the underlying structure of the signal [2]- [4]. GSP has been used in a wide range of applications for irregularly-structured data such as compression [5], sampling and restoration [6]- [10], and analysis of graph signals [11], [12].
Graph signal restoration is an important task aiming to address the problems of noise and missing values. For example in sensor networks, some sensors may not work properly resulting in missing values, and samples on the nodes are often noisy [13]. Many approaches for graph signal restoration have been proposed based on regularized optimization [14], graph filters and filter banks [15]- [17], and deep learning on graphs [18]. These existing works can be classified into two main approaches: 1) model-based restoration and 2) neural networkbased restoration.
Model-based restoration: Model-based approaches often rely on convex optimization whose objective function contains a data fidelity term and a regularization term [14]. Signal priors are often required in such tasks because the problem is ill-posed. For example, a smoothness prior like graph total variation (GTV) has demonstrated effectiveness in graph signal denoising, whereas graph spectral filters have been shown to satisfy certain quadratic optimization solutions [19]- [22]. A limitation of model-based restoration methods is that they are often iterative as illustrated in Fig. 1 (left). Performance and speed of the algorithm depend on the hyper-parameters θ (e.g., step size and regularization strength) whose values are determined manually and are fixed throughout the iterations.
Neural network-based restoration: Graph convolutional networks (GCNs) are considered as a counterpart of the convolutional neural networks for image processing [23]. GCNs can automatically learn network parameters to minimize a loss function. However, GCNs have two drawbacks: 1) lack of interpretability and 2) the requirement of a large dataset for training. Furthermore, as reported in [23]- [25], deeper networks cannot always achieve good performance in the graph settings, in contrast to the remarkable success of convolutional networks for signals on a regular grid [26]. Therefore, many GCNs are limited to a small number of layers [27], [28].
As a hybrid approach of the model-and neural networkbased restoration methods, we utilize deep algorithm unrolling (DAU) by integrating learnable parameters into the iterative algorithm [25], [29]- [32]. As illustrated in Fig. 1 (right), DAU unrolls the iterations of the iterative algorithm and arXiv:2106.15910v3 [eess.SP] 2 Jun 2022 deploys the trainable parameters at each unrolled iteration [32], [33]. Instead of manually choosing the parameters as in the conventional iterative approach, parameters in each unrolled iteration are determined from the training data so as to minimize a loss function. The practical advantages of DAU against the classical iterative solver are faster convergence and performance improvement since the parameters are learned to fit the target signals. Advantages compared with fully parameterized neural networks are the interpretability and a small number of parameters. Hence, the networks can be trained with a small number of training data.
An extension of DAU for graph signal denoising was recently developed in [25], which proposed unrolled GCNs based on two optimization problems of sparse coding and trend filtering. Although the formulation itself allows the network to be arbitrarily deep, the number of layers is set to be very small (typically one middle layer) in its practical implementation. This is because deeper networks do not result in better performance in this case. Additionally, GCNs often assume a fixed graph both in the training and testing phases. However, the underlying graphs are often slightly perturbed in practice. Hence, restoration algorithms should be robust to (small) perturbations of graphs. A detailed comparison between [25] and our approach is further discussed in Section III-D.
In this work, we first propose a simple yet efficient graph signal denoising method that utilizes DAU of the alternating direction method of multiplier (ADMM) to solve a minimization problem with two regularizers based on graph total variation and elastic net. In contrast to [25], we only train the graph-independent regularization parameters in the modelbased iterative algorithms. The resulting denoising algorithm contains a significantly smaller number of parameters than neural network-based methods while showing better denoising results.
Next, we propose a nested version of DAU based on unrolling the iterations of Plug-and-Play ADMM (PnP-ADMM) [34]- [38]. This version is designed for general graph signal restoration problems with linear degradation. In this approach, the ADMM-based denoiser is plugged into the unrolled PnP-ADMM algorithm leading to a nested DAU structure. All of the parameters in the algorithm are trained in an end-to-end fashion [30], [32], [33].
In contrast to GCN-based methods, parameters to be tuned in the proposed techniques are graph-independent leading to the following advantages: 1) Interpretability: All internal modules are designed based on (convex) optimization algorithms.
2) Ease to train: Our techniques do not require large training data due to the small number of parameters. 3) Transferability: Since our methods only tune graphindependent parameters, we can immediately use the same parameter set for graphs with different sizes.
We also avoid large matrix inversion by using popular acceleration techniques in GSP: 1) precomputing graph Fourier bases and 2) polynomial approximation. Through comprehensive experiments on denoising and interpolation for synthetic and real-world data, our proposed methods are shown to achieve better performance than existing restoration methods including graph low-pass filters, model-based iterative optimization, and [25], in terms of root mean squared error (RMSE). The remainder of this paper is organized as follows. Signal restoration algorithms using ADMM and PnP-ADMM are introduced in Section II along with notation used throughout the paper. The proposed two restoration methods are introduced in Section III. Experimental results comparing denoising and interpolation performances with existing methods are shown in Sections IV and V. Section VI concludes this paper.

II. SIGNAL RESTORATION WITH ADMM
In this section, we first present notations and the problem formulation. Then, we review ADMM and PnP-ADMM which are the fundamental building blocks of our algorithms.

A. Notation
Throughout the paper, vectors and matrices are written in bold style and sets are written as calligraphic letters. An The number of vertices and edges is |V| = N and |E|, respectively; w i,j ∈ R ≥0 denotes the edge weight between v i and v j . We define a weighted adjacency matrix of G as an N × N matrix with [W] ij = w i,j ; [W] ij = 0 represents unconnected vertices. In this paper, we consider a graph that does not have self-loops, i.e., [W] ii = 0 for all i. The degree matrix of G is defined as a diagonal matrix [D] ii = j w i,j . The combinatorial graph Laplacian matrix of G is given by L = D − W. Since L is a real symmetric matrix, L always has an eigendecomposition. Let the eigendecomposition of the graph Laplacian matrix be L = UΛU , where U is an eigenvector matrix and Λ = diag(λ 1 , . . . , λ N ). The weighted graph incidence matrix is denoted as M ∈ R |E|×N . We index integers to the set of edges as E = {e s } |E| s=1 . Then, the sth row and tth column of M corresponding to e s and v t is A graph signal x : V → R is a function that assigns a value to each vertex. It can be written as a vector x ∈ R N in which the ith element x[i] represents the signal value at the ith vertex.

B. General Restoration Problem
Consider an observed graph signal y ∈ R N which is related to an input graph signal x ∈ R N as where H ∈ R N ×N is a degradation matrix and n ∼ N (0, σ 2 I) is an i.i.d. additive white Gaussian noise (AWGN). Throughout the paper, we assume that x is a graph signal, i.e., its domain is given by G. This graph structure will be exploited to provide a prior for the recovery problem. The degradation model (1) generally appears in restoration problems such as denoising, interpolation, deblurring, and super-resolution, to name a few. The main objective of many restoration problems is estimating an unknown x from a given degraded signal y. We assume that H is known a priori. In this paper, we perform two representative experiments with the following H: 1) H = I (denoising), and 2) a binary H matrix (interpolation).
C. Plug-and-Play ADMM 1) ADMM: Many inverse problems are posed as the following unconstrained minimization problem: where g is some regularization function, λ ∈ R ≥0 is the regularization parameter, and A ∈ R M ×N is an arbitrary matrix. A widely used algorithm to solve (2) is the alternating direction of multipliers (ADMM) which has been used to solve generic unconstrained optimization problems with nondifferentiable convex functions (see [39] for details). Through variable splitting, the general problem (2) is rewritten as the following constrained minimization problem: Applying ADMM to (3) leads to the following sequence of subproblems: where t (p) ∈ R M is the Lagrangian multiplier, s (p) ∈ R M is an auxiliary variable, g is the regularization function in (2), . 2) Plug-and-Play ADMM: PnP-ADMM is a variation of the classical ADMM [34] for the problem of (3) with A = I. Oftentimes, (4a) and (4b) are called the inverse step and denoising step (i.e., denoiser), respectively [35]. A notable feature is that any off-the-shelf denoiser, including deep neural networks, can be used instead of naively solving (4b) without explicitly specifying regularization terms g before implementation. Such examples are found in [40]- [42].
Empirically, PnP-ADMM has demonstrated improved performance over the standard ADMM with explicit regularization in some image restoration tasks [36], [43], [44]. Graph signal restoration with PnP-ADMM is also studied in [21] showing improved restoration performance over the existing model-based techniques.
In this paper, we follow an approach of PnP-ADMM proposed in [37]. Suppose that two initial variables s (0) , t (0) ∈ R N are set. The algorithm of PnP-ADMM corresponding to (4a)-(4c) (again, assuming A = I) is represented as where D g is an off-the-shelf denoiser. Note that we still need to determine the parameter ρ and the off-the-shelf graph signal denoiser D g (and its internal parameters) prior to running the algorithm.
The key idea of our proposed method is to unroll the ADMM and PnP-ADMM for graph signal processing.

III. GRAPH SIGNAL RESTORATION ALGORITHMS
In this section, we propose the following two graph signal restoration methods, both based on DAU. 1) GraphDAU: Graph signal denoiser by unrolling ADMM to address the problem H = I. We consider a mixture of 1 and 2 regularization terms like the elastic net [45]. GraphDAU works as a better independent denoiser than the model-based and deep-learning-based approaches. 2) NestDAU: General graph signal restoration algorithm by unrolling PnP-ADMM to handle a generic H. We plug the GraphDAU into each layer of an unrolled PnP-ADMM as a denoiser. Our methods are illustrated in Fig. 2.

A. GraphDAU
GraphDAU considers the case where H = I due to signal denoising and A = M in (2). It combines the regularization terms of graph total variation (GTV) and graph Laplacian regularization [46], leading to where Mx 1 and Mx 2 2 = x Lx (since L = M M) are the regularization terms for first-order and second-order differences, respectively, and λ 1 and λ 2 are nonnegative regularization parameters. The second and third terms in (6) can be written explicitly as where N i is a set of vertices connecting with v i . The norms (7) and (8) are effective regularization functions for piecewise constant and smooth graph signals [20]. In this paper, GraphDAU is only applied for denoising and not used for the general restoration problems in (2). This is because we use various acceleration techniques introduced in Section III-A3 under the assumption H = I.
We utilize ADMM as a baseline iterative solver of (6). The variable splitting is applied to (6) with v = Mx, leading to the following constrained minimization problem: The solution of (9) can be found by solving a sequence of the following subproblems [47]: where γ is the step size of the algorithm and S λ1γ is the softthresholding operator where sgn(·) denotes the signum function. Next, we unroll the iteration of (10a)-(10c) to design a trainable D g . In other words, instead of using fixed parameters in (10a)-(10b), we deploy trainable parameters in each iteration. The terms including M and M M in (10a)-(10c) are graph filters, i.e., graph convolution, and are fixed: We only tune three parameters, γ, λ 1 , and λ 2 , in each unrolled iteration. This is because we aim to construct an interpretable and easy-to-train graph signal restoration algorithm. The training configurations are described later in Section IV-A2 1 . In the following sections, we propose two forms of GraphDAU and introduce its acceleration techniques.
1) GraphDAU-TV: In this method, we only consider the 1 term of (9) by setting λ 2 = 0. Then, we choose γ and γλ 1 to be learnable, i.e., . This regularization is based on the assumption that the signal is piecewise constant.
2) GraphDAU-EN: This GraphDAU is based on a combination of the 1 and 2 regularizations in (9) like the elastic net (EN), defined by the weighted incidence matrix M. We introduce a set of trainable parameters . This method automatically controls piecewise and smoothness terms at each layer.
3) Algorithm Acceleration: The graph filter (I + 1 γ M M) −1 in (10a) requires matrix inversion and its computational complexity is typically O(N 3 ) (for a dense matrix). If each layer requires calculating the inversion, the complexity becomes O(N 3 L). We consider accelerating GraphDAU by the following two popular techniques: 1) eigendecomposition of L, and 2) Chebyshev polynomial approximation of a graph filter. Precomputing Eigendecomposition: In this approach, we precompute the eigendecomposition (EVD) of L. The inverse matrix in (10a) can be decomposed as Since I + (1/γ )Λ is a diagonal matrix, the inversion has O(N ) complexity. If G does not change frequently throughout the iterations (which often is the case), the eigenvalues Λ and eigenvectors U are fixed. Therefore, the eigendecomposition of the graph Laplacian is performed only once. This Graph-DAU with acceleration is represented with the suffix -E and is summarized in Algorithm 1.
Chebyshev Polynomial Approximation: This technique approximates (12) with a polynomial, for example, using the Chebyshev polynomial approximation (CPA) (see [48], [49] for details). First, we rewrite the inverse step at the th layer corresponding to (12) as where This filter kernel has the following graph frequency response: where h ( ) (x) = γ /(γ +x) is the filter kernel which acts as a graph low-pass filter. By performing K-truncated Chebyshev approximations to h ( ) (x), the approximated version H ( ) (L) is represented as: GraphDAU with Chebyshev polynomial approximation is specified by a suffix -C in Algorithm 1.

B. NestDAU: Unrolled PnP-ADMM with GraphDAU as the Denoiser
Next, we develop a restoration algorithm for general H in (2). The baseline algorithm we consider is PnP-ADMM introduced in (4a)-(4c) because it is able to adapt to general H. In addition, any denoiser can be used in its internal algorithm to boost performance.
Suppose that the iteration number P is given. We then unroll (5a)-(5c) of the PnP-ADMM iterations to construct P layer networks. That is, we set ρ in (5a) to be learnable, i.e., ρ → {ρ p } P −1 p=0 in which p indicates the layer number. The restoration steps are equivalent to those in PnP-ADMM with P iterations, but each iteration is conducted with different regularization parameters.
The important part of the restoration algorithm is the design of the off-the-shelf denoiser D g in (5b) since (5a) and (5c) are independent of the underlying graph. In this paper, we Algorithm 1 GraphDAU for graph signal denoising D compute the eigendecomposition L = UΛU 4: for = 0, · · · , L − 1 do 5: 9: end for 10: return x (L) aim to keep the algorithm fully interpretable and the number of parameters small for efficient training, and thereby, we utilize GraphDAU in each layer as D (p) g . As a result, the restoration algorithm has a nested unrolled structure as shown in Fig. 2. Based on this structure, we refer to the proposed method as NestDAU. Note that all the parameters in NestDAU, including those in GraphDAU, can be trained in an end-to-end fashion from a training set. The training details are presented in Section IV-A2.
Algorithm 2 shows the details of NestDAU. Note that we perform two representative signal restoration experiments (i.e., denoising and interpolation) in this paper, but NestDAU can be applicable to other cases as well, e.g., deblurring [50] and point cloud super-resolution [51].

C. Summary of Computation Issues
In Table I, we compare the proposed methods in terms of the regularization function, the acceleration technique, the number of parameters, and the computational complexity. NestDAUs are classified based on its GraphDAU specifications and have the same suffix as the corresponding GraphDAU.
The number of parameters linearly increases in proportion to the number of layers L but is independent of N . The Algorithm 2 NestDAU for graph signal restoration NOTE: Background colors correspond to those in Fig. 2.
1: for p = 0, · · · , P − 1 do 2: complexity mainly depends on the use of EVD. The methods with EVD have complexities depending on the number of nodes N , while those with CPA only rely on the number of edges |E| and the polynomial order K; K|E| is generally much smaller than N 2 when N becomes large. As mentioned, the proposed methods require training data (i.e., a set of ground-truth and degraded data) to tune parameters. They come from the hyperparameter(s) of the original (PnP-)ADMM algorithms. Note that, even for a regular ADMM, we need to determine the optimal hyperparameter(s) for practical applications: This often needs training data.
In general, many trainable parameters in deep learning require a large dataset to avoid overfitting. This implies that GNNs require many training data. In contrast, NestDAU and GraphDAU have significantly fewer parameters than representative deep learning methods. This leads to that the proposed method can train with the small number of training data, which is beneficial for practical applications. This is experimentally verified in Sections IV and V.
D. Comparison to [25] Two approaches for graph signal denoising based on DAU, called graph unrolling sparse coding (GUSC) and graph unrolling trend filtering (GUTF), were proposed in [25]. Since they have the same objective as that for GraphDAU, we compare the details of DAU-based graph signal restoration methods in Table II. First, GUSC/GUTF only consider the problem of graph signal denoising. This is the same objective as that of GraphDAU, while NestDAU focuses on a generic restoration problem in (1). This is possible by employing the PnP-ADMM as a prototype of the iterative algorithm. Second, the regularization of GUSC/GUTF only contains the 1 term, while GraphDAU also includes an 2 term Mx 2 2 = x Lx, which is beneficial for globally smooth signals. GraphDAU-EN can automatically control the regularization weights between the 1 and 2 terms, leading to flexibility in capturing signal characteristics. Third, GUSC and GUTF train parameters in an unsupervised setting while our proposed methods train the network in a supervised way. In the following experiments, we train GUSC/GUTF in a supervised setting for a fair comparison. Extending GraphDAU and NestDAU to the unsupervised setting is left for future work.
In (10a), we keep the structure of the original graph filter h(L) = (I + 1 γ L) −1 of the ADMM algorithm and only train a graph-independent parameter γ. As such, GraphDAU performs stably with many layers (typically L = 10 in the experiments). In contrast, GUSC/GUTF use GCNs for its internal algorithm. Therefore, they result in few middle layers (as reported in [25], they have only one middle layer in the experiment). They reduce many learnable parameters compared to usual GCNs thanks to their edge-weight-sharing convolution, however, they still contain many parameters. A detailed comparison of the number of parameters is presented along with the restoration performance in Section IV.

IV. EXPERIMENTAL RESULTS: DENOISING
In the following two sections, we compare graph signal restoration performances of NestDAU and GraphDAU with existing methods using synthesized and real-world data. In both sections, parameters of the proposed and neural networkbased methods are trained by setting the mean squared error (MSE) 1 N x − x * 2 2 as a loss function, where x ∈ R N is the restored signal and x * ∈ R N is the ground-truth signal available during the training phase.
In this section, we consider denoising corresponding to H = I in (1).
We conduct three experiments: 1) Denoising on fixed graphs; 2) Denoising on graphs with perturbation; 3) Transferring tuned parameters to different N .
In the following subsections, we describe the details of the denoising experiment. We also show an in-depth analysis of the proposed methods in terms of the number of layers (i.e., L or P ) and the polynomial order K.  A. Methods and Training Configurations 1) Alternative Methods: We compare the denoising performance with several existing methods using smoothing filters and optimization approaches: • Graph spectral diffusion with heat kernel (HD) [53]; • Spectral graph bilateral filter (SGBF) [22], [54]; • ADMM-based smoothing with a fixed parameter; ((10a)-(10c)) with 10 iterations; • PnP-ADMM-based smoothing with fixed parameters with 8 iterations [21]: Its formulation is given in Section III-B and off-the-shelf denoisers are HD or SGBF. Filtering operations of the algorithms are partly implemented by pygsp [55]. For a fair comparison, their fixed parameters are tuned by performing a grid search on the validation data to minimize RMSE.
We also include the following deep learning-based methods for comparison:  [25]. These existing methods are set to 64 dimensions as a hidden layer of neural nets as in the setting in [25]. These methods and ours are implemented with Pytorch [57]. MLP, GCN, GCN-R, and GAT are trained for 30 epochs that lead to convergence of the loss function. GUTF and GUSC are trained with the same hyper-parameters as [25], but they are trained in the supervised setting in this paper.
2) Training Configuration: On the basis of preliminary experiments, hyper-parameters used for training of the proposed methods are summarized in Table III. Training scheduler StepLR in Pytorch is used to gradually decay the learning rate by multiplying 0.6 each epoch. Since our proposed methods have a small number of parameters, training usually converges in no more than three epochs. A detailed performance analysis is discussed in Section IV-F.

B. Datasets and Setup
Here, we describe the details of the experiments and datasets. The dataset specifications are summarized in Table IV. 1) Denoising on Fixed Graphs: The first experiment is graph signal denoising for the following fixed graphs: • Synthetic signals on a community graph having three clusters (N = 250); • Synthetic signals on a random sensor graph (N = 150); • Temperature data in the United States (N = 614). We assume that the graph is consistent in all of the training, validation, and testing phases. Characteristics of Graphs and Graph Signals: The community graph is generated by pygsp [55] and is shown in Fig. 3a. We synthetically create piecewise constant graph signals based on the cluster labels of the community graph. Note that the cluster labels are different while the graph itself is fixed. Each cluster in the graph is assigned an integer value between 1 to 6 randomly as its cluster label. Then, AWGN (σ = {0.5, 1.0}) is added to the ground-truth signals.
The random sensor graph is also obtained by pygsp [55] and is shown in Fig. 4a. On the random sensor graph, piecewise-smooth signals are synthesized in the following manner. First, vertices on a graph are partitioned into eight non-overlapping subgraphs {G k } 8 k=1 . Then, smooth signals on G k are synthesized based on the first three eigenvectors of the graph Laplacian of G k . Let L k and U k be the graph Laplacian of G k and its eigenvector matrix, respectively. Then, a smooth signal on G k is given by where U k,3 is the first three eigenvectors in U k and d ∈ R 3 are expansion coefficients whose element is randomly selected from [0, 5]. Finally, a piecewise-smooth signal on G is obtained by combining eight x k 's as follows: where 1 C k ∈ {0, 1} N ×|C k | is the indicator matrix in which [1 C k ] i,j = 1 when the node i in G corresponds to the node j in G k and 0 otherwise. AWGN (σ = {0.5, 1.0}) is added to the ground-truth signals.
In order to demonstrate the effectiveness of our method for real-world data, we use daily average temperature data in the United States in 2017, provided by QCLCD 2 [59]. The data contain local temperatures recorded at weather stations, yet they include missing observations. To obtain the completed data (as the ground truth) for a year, we conduct the following preprocessing: 1) 614 stations (out of 7501 ones) having relatively few missing values are selected. 2) Missing values in these stations are filled using the average temperatures observed at the same station in the previous and subsequent days. For experiment, we split the dataset into three parts: 304  training (January to October), 30 validation (November), and 31 testing (December) data. In this experiment, we study four noise strengths of AWGN, i.e., σ = {3.0, 5.0, 7.0, 9.0}. The weighted graph is constructed by an 8-nearest neighbor (NN) graph based on the stations' geographical coordinates.
2) Denoising on Graphs with Perturbation: The second experiment is conducted for signals on graphs with perturbation to verify the robustness of the proposed method to small perturbations of the underlying graph. Indeed, the tuned parameters for one graph are not expected to work properly for a completely different graph because the topologies and graph Fourier basis on different graphs are different. However, signals on similar graphs, in terms of their edge weights, could have similar characteristics and therefore, it is expected that the learned parameters for one graph could work satisfactorily for the signal on another graph if these two graphs are similar enough.
Note that many graph neural network-based methods assume the graph is fixed, while our approach based on DAU only needs to tune graph-independent parameters. Thus, we can use different graphs in each epoch for training, validation, and testing. In this experiment, we only showcase the performance with a comparison to the model-based methods because the model-based approaches are applicable even if the graphs are different.
We used the following graph signals: • Synthetic signals on random sensor graphs (N = 150) having piecewise-constant, piecewise-smooth, and globally-smooth characteristics; • RGB color attributes on 3D point clouds (N = 1, 000). Characteristics of Graphs and Graph Signals: For the experiment on random sensor graphs, each graph is synthetically generated by using a different seed of graphs.Sensor from pygsp [55]. This results in that all graphs have different topologies and edge weights, but their characteristics are similar.
We then synthetically generate the following graph signals: a) Piecewise constant signals: We first partition each graph into five clusters with non-overlapping nodes and randomly assign an integer for each cluster between 1 to 6.
The cluster labels are used as a graph signal. b) Piecewise smooth signal: Similar to the piecewise constant case, we first partition each graph into five clusters with non-overlapping nodes. The signal is generated by (16) and (17). c) Globally smooth signal: The signal is obtained with a linear combination of the first five graph Fourier basis with random expansion coefficients d ∈ R 5 like (16). In this experiment, we study two noise strengths of AWGN, i.e., σ = {0.5, 1.0}.
As real data on graphs with perturbation, we use the color attributes of 3D point clouds from JPEG Pleno Database [60], where the human motions are captured as point clouds. We randomly sample 1, 000 points from the original data. Then, weighted graphs are constructed using a 4-NN method whose weights are determined based on the Euclidean distance. Graphs in this dataset are, therefore, not fixed because the Euclidean distances between points are different due to random sampling. AWGN with σ = {20, 30, 40} is added to each sample to yield a noisy signal. Note that the implementation of the proposed methods are conducted channel-wise so that the parameters are adjusted to each channel.
3) Parameter Transfer for Different N : The number of nodes of a graph directly influences computation complexities for all (training, validation, and testing) phases. To apply the proposed methods to a signal with large N , naive training results in large computational burden. Motivated by this, in the third experiment, we consider transferring the learned parameters to graph signals having different N . That is, parameters trained with signals on a small graph G with N ( N ) nodes are reused for evaluation with signals on G with N nodes. This approach can be easily realized with the proposed method since its parameters are independent of N .
We first train the GraphDAU-TV-C (i.e., Chebyshev polynomial version of GraphDAU based on the GTV regularization) on the 3D point cloud datasets with N = 1, 000 points. After that, the pre-trained parameters are applied to the datasets with larger N = {2, 000, 5, 000, 10, 000}.

C. Denoising Results: Fixed Graph
The experimental results on the fixed graphs are summarized with the number of parameters in Table V. Visualizations of the denoising results are also shown in Figs. 3, 4, and 5.
In most cases, the proposed methods show RMSE improvements compared to all of the alternative methods. It is observed that the proposed approach successfully restored graph signals having various characteristics. Note that, in spite of the performance improvements, our methods have a significantly smaller number of parameters than the neural network-based approaches.
Although GraphDAU-TV and -EN outperform existing methods, NestDAUs provide even better performance by incorporating GraphDAUs as submodules of NestDAU. These results imply that the nested structure is effective for graph signal restoration. NestDAU using EVD often outperforms that using CPA in most datasets and conditions. Table VI summarizes the results of the second experiment, denoising on graphs with small perturbation. Overall, our algorithms outperform the alternatives as in the case for the fixed graph. The proposed techniques show RMSE improvements for all of the signal types under consideration. This implies that our methods effectively reflect the signal prior as the tuned parameters through training, leading to robustness against a slight change of graphs.

E. Transferring Tuned Parameters for Different N
The results of the third experiment, transferring the tuned parameters to different N , are summarized in Table VII.
This shows that even if the number of nodes increases, the proposed method works well as long as the signal and graph properties are similar. Fig. 6 shows the visualization of noisy and denoised results.

F. Performance Study: The Number of Layers
Along with the denoising results, the effect of the number of layers is studied here. We use the dataset of the fixed community graph whose details are described in Section IV-B.
1) GraphDAU: Fig. 7a shows the performance analysis in terms of the number of layers L of GraphDAU for L ∈ {1, . . . , 30}. The average RMSE in the test data is reported. As can be seen in the figure, the RMSE of GraphDAU rapidly decreases for L ≤ 10, whereas there is a slight improvement for L > 10. We observed that GraphDAU-TV-E steadily decreases RMSEs while they are slightly oscillated for GraphDAU-EN-E. Fig. 7b shows the influence of the polynomial order K ∈ {2, . . . 30} of GraphDAU-TV-C and -EN-C with L = 10. Both methods almost monotonically decrease RMSEs as K becomes larger.
2) NestDAU: Fig. 7c shows the performance in terms of the number of layers P of NestDAU. The submodule GraphDAU contains L = 10 for using EVD and L = 10 and K = 10 with that using CPA. The number of layers is selected to P ∈ {1, . . . , 10}. For NestDAU, all configurations are stable in terms of the layer size P . Even if the in-loop denoisers are changed, the performances are almost equivalent. Fig. 8 shows the average RMSEs of the validation data during training. The data used are signals on community graphs (σ = 0.5) described in Section IV-B. As shown in the figure, the RMSEs rapidly decrease with less than 250 iterations (that is, the number of training data). Furthermore, NestDAUs converge faster than their GraphDAU counterparts.

G. RMSE Analysis during Training
V. EXPERIMENTAL RESULTS: INTERPOLATION In this section, graph signal interpolation is performed and compared with the alternative methods. We assume the nodes for missing signal values are known and they are set to zero. This leads to a diagonal binary matrix H = diag{0, 1} N in (1) with various missing rates.

A. Alternative Methods
For interpolation, the following techniques are selected for comparison: • Bandlimited graph signal recovery based on graph sampling theory [8]: Bandwidth is set to N/10; • PnP-ADMM-based interpolation with fixed parameters with 8 iterations [21]: Its formulation is given in Section III-B and off-the-shelf denoisers are HD or SGBF; • GUTF [25]; • GUSC [25]. Although GUSC and GUTF are originally developed for a denoising task, we also include these methods to compare with neural network-based approaches. The setup is the same as the previous section.

B. Datasets and Setup
We used the following graph signals for interpolation: • Synthetic signals on a community graph having three clusters (N = 250); • Temperature data of the United States (N = 614). They are the same signals as those used in the denoising experiment in the previous section. Characteristics of Graphs and Graph Signals: Synthetic graph signals on the community graph are generated in the same setup as that of the denoising experiment. We then consider two interpolation conditions: 1) noiseless and 2) noisy (AWGN with σ = 0.5). Three types of missing rate are considered: 30%, 50%, and 70%.
The U.S. temperature data are also used in this experiment as a real-world example. In this case, AWGN (σ = 9.0) are added onto the observed daily temperature data with the same setting as the denoising experiment. Then, missing rates are set to 30%, 50%, and 70% to validate the interpolation method. Note that the missing nodes are randomly chosen, i.e., H are set to be different across all data.

C. Interpolation Results
The RMSE results obtained by the proposed and existing methods are summarized in Table VIII. The visualizations of the interpolation results are also shown in Figs. 9 and 10.
As can be seen, the proposed approaches show better RMSE than the alternatives. For the community graph, NestDAU-TV shows better results than NestDAU-EN. This is because NestDAU-TV reflects the prior of the graph signals, i.e., piecewise constant. For the U.S. temperature data, NestDAU-EN is better than NestDAU-TV because the temperature data tend to be very smooth on the graph. In particular, NestDAU-EN-C outperforms the others in all missing rates. This implies that the proposed NestDAU presents its effectiveness beyond denoising.

VI. CONCLUDING REMARKS
In this paper, we proposed graph signal denoising and restoration methods based on ADMM and Plug-and-Play ADMM with deep algorithm unrolling, respectively. The ADMM-based unrolled denoiser automatically controls its regularization strengths by tuning its parameters from training data. The PnP-ADMM-based unrolled restoration is applicable to any linear degradation matrix and contains the proposed ADMM-based denoiser in its sub-module, leading to a nested DAU structure. The unrolled restoration methods provide fully interpretable structures and have a small number of parameters with respect to fully parameterized neural networks. The techniques only tune layer-wise trainable parameters in the iterative algorithm and do not include fully-connected neural networks. This implies that we only need a small set of training data: It is beneficial especially for graph signals because their underlying structures often change. In extensive experiments, the proposed methods experimentally outperform various alternative techniques for graph signal restoration. Furthermore, we can reuse the learned parameters for graphs with different sizes.

APPENDIX
Here, we present some non-trivial gradient computations of the trainable parameters with respect to the learnable parameters in GraphDAU.