Introducing Hypergraph Signal Processing: Theoretical Foundation and Practical Applications

Signal processing over graphs has recently attracted significant attentions for dealing with structured data. Normal graphs, however, only model pairwise relationships between nodes and are not effective in representing and capturing some high-order relationships of data samples, which are common in many applications such as Internet of Things (IoT). In this work, we propose a new framework of hypergraph signal processing (HGSP) based on tensor representation to generalize the traditional graph signal processing (GSP) to tackle high-order interactions. We introduce the core concepts of HGSP and define the hypergraph Fourier space. We then study the spectrum properties of hypergraph Fourier transform and explain its connection to mainstream digital signal processing. We derive the novel hypergraph sampling theory and present the fundamentals of hypergraph filter design based on the tensor framework. We present HGSP-based methods for several signal processing and data analysis applications. Our experimental results demonstrate significant performance improvement using our HGSP framework over some traditional signal processing solutions.


I. INTRODUCTION
G RAPH theoretic tools have recently found broad applications in data science owing to their power to model complex relationships in large structured datasets [1]. Big data, such as those representing social network interactions, Internet of Things (IoT) intelligence, biological connections, mobility and traffic patterns often exhibit complex structures that are challenging to many traditional tools [2]. Thankfully, graphs provide good models for many such datasets as well as the underlying complex relationships. A dataset with N data points can be modeled as a graph of N vertices, whose internal relationships can be captured by edges. For example, subscribing users in a communication or social network can be modeled as nodes while the physical interconnections or social relationships among users are represented as edges [3].
Taking advantage of graph models in characterizing complex data structures, graph signal processing (GSP) has emerged as an exciting and promising new tool for processing large datasets with complex structures. A typical application of GSP is in image processing, where image pixels are modeled as graph signals embedding in nodes while pairwise similarities between pixels are captured by edges [6]. By modeling images using graphs, tasks such as image segmentation can take advantage of graph partition and GSP filters. Another example of GSP applications is in processing data from sensor networks [5]. Based on graph models directly built over network structures, a graph Fourier space could be defined according to the eigenspace of a representing graph matrix such as the Laplacian or adjacency matrix to facilitate data processing operations such as denoising [7], filter banks [8] and compression [9].
Despite many demonstrated successes, the GSP defined over normal graphs also exhibits certain limitations. First, normal graphs cannot capture high-dimensional interactions among signals or users in many practical scenarios. Since each edge in a normal graph only models the pairwise interaction between two nodes, the traditional GSP can only deal with the pairwise relationships defined by such edges. In reality, however, multi-way interactions among nodes are more informative and powerful in signal processing and data analysis [12]. In biology, for example, a trait may be attributed to multiple interactive genes [13] such that the relationships among genes and traits cannot be modeled pairwisely by using normal graphs. Another example is the social network with online social communities called folksonomies, where threeway interactions occur among users, resources, and annotations [11], [37]. Second, the GSP typically neglects multiple relationships that cannot be captured with simple-edge graphs. Social network data, for example, usually contain more than one types of interactions between two nodes [17], where we often have multiple different types of links among nodes. Thus, the traditional GSP based on matrix analysis has far been unable to efficiently handle such complex relationships. Clearly, there is a need for a more general graph model and graph signal processing concept to remedy the aforementioned shortcomings faced with the traditional GSP.
To find a more general model for complex data structures, we venture into the area of high-dimensional graphs known as hypergraphs. The hypergraph theory is playing an increasingly important role in graph theory and data analysis, especially for analyzing high-dimensional data structures and interactions [14]. A hypergraph consists of nodes and hyperedges connecting more than two nodes [15]. As an example, Fig.1(a) shows a hypergraph example with three hyperedges and seven nodes, whereas Fig. 1(b) provides a corresponding dataset modeled by this hypergraph. Indeed, a normal graph is a special case of a hypergraph, where each hyperedge degrades to a simple edge that only involves exactly two nodes.
Hypergraphs have found successes by generalizing normal graphs in many applications, such as clustering [36], classification [18], and prediction [19]. A hypergraph is a natural arXiv:1907.09203v2 [eess.SP] 12 Aug 2019 (a) Example of hypergraphs: the hyperedges are the overlaps covering nodes with different colors.
(b) A game dataset modeled by hypergraphs: each node is a specific game and each hyperedge is a catagory of games. extension of a normal graph in modeling signals of highdegree interactions. Presently, however, the literature provides little coverage on hypergraph signal processing (HGSP). The only known work [4] proposed a HGSP framework based on a special hypergraph called complexes. In this work [4], hypergraph signals are associated with each hyperedge, but its framework is limited to cell complexes, which cannot suitably model many real-world datasets and applications. Another shortcoming of the framework in [4] is the lack of detailed analysis and application examples to demonstrate its practicability. In addition, the attempt in [4] to extend some key concepts from the traditional GSP simply fails due to the difference in the basic setups between graph signals and hypergraph signals. In this work, we seek to establish a more general and practical HGSP framework, capable of handling arbitrary hypergraphs and naturally extending the traditional GSP concepts to handle high-dimensional interactions. We will also provide real application examples to validate the effectiveness of the proposed framework.
Compared with the traditional GSP, a generalized HGSP faces several technical challenges. The first problem lies in the mathematical representation of hypergraphs. Developing an algebraic representation of a hypergraph is the foundation of HGSP. Currently there are two major approaches: matrixbased [28] and tensor-based [16]. The matrix-based method makes it hard to implement the hypergraph signal shifting while the tensor-based method is difficult to be understood conceptually. Another challenge is in defining signal shifting over the hyperedge. Signal shifting is easy to be defined as propagation along the link direction of a simple edge connecting two nodes in a regular graph. However, each hyperedge in hypergraphs involves more than two nodes. How to model signal interactions over a hyperedge requires careful considerations. Other challenges include the definition and interpretation of hypergraph frequency.
To address the aforementioned challenges and generalize the traditional GSP into a more general hypergraph tool to capture high dimension interactions, we propose a novel tensor-based HGSP framework in this paper. The main contributions in this work can be summarized as follows. Representing hypergraphs as tensors, we define a specific form of hypergraph signals and hypergraph signal shifting. We then provide an alternative definition of hypergraph Fourier space based on the orthog-onal CANDECOMP/PARAFAC (CP) tensor decomposition, together with the corresponding hypergraph Fourier transform. To better interpret the hypergraph Fourier space, we analyze the resulting hypergraph frequency properties, including the concepts of frequency and bandlimited signals. Analogous to the traditional sampling theory, we derive the conditions and properties for perfect signal recovery from samples in HGSP. We also provide the theoretical foundation for the HGSP filter designs. Beyond these, we provide several application examples of the proposed HGSP framework: 1) We introduce a signal compression method based on the new sampling theory to show the effectiveness of HGSP in describing structured signals; 2) We apply HGSP in spectral clustering to show how the HGSP spectrum space acts as a suitable spectrum for hypergraphs; 3) We introduce a HGSP method for binary classification problems to demonstrate the practical application of HGSP in data analysis; 4) We introduce a filtering approach for the denoising problem to further showcase the power of HGSP; 5) Finally, we suggest several potential applicable background for HGSP, including Internet of Things (IoT), social network and nature language processing. We compare the performance of HGSP-based methods with the traditional GSP-based methods and learning algorithms in all the above applications. All the features of HGSP make it an essential tool for IoT applications in the future.
We organize the rest of the paper as follows. Section II first summarizes the preliminaries of the traditional GSP, tensors, and hypergraphs. In Section III, we then introduce the core definitions of HGSP, including the hypergraph signal, the signal shifting and the hypergraph Fourier space, followed by the frequency interpretation and decription of existing works in Section IV. We present some useful HGSP-based results such as the sampling theory and filter design in Section V. With the proposed HGSP framework, we provide several potential applications of HGSP and demonstrate its effectiveness in Section VI, before presenting the final conclusions in Section VII.

A. Overview of Graph Signal Processing
GSP is a recent tool used to analyze signals according to the graph models. Here, we briefly review the key relevant concepts of the traditional GSP [1], [2].
A dataset with N data points can be modeled as a normal graph G(V, E) consisting of a set of N nodes V = {v 1 , · · · , v N } and a set of edges E. Each node of the graph G is a data point, whereas the edges describe the pairwise interactions between nodes. A graph signal represents the data associated with a node. For a graph with N nodes, there are N graph signals, which are defined as a signal vector Usually, such a graph could be either described by an adjacency matrix A M ∈ R N ×N where each entry indicates a pairwise link (or an edge), or by a Laplacian matrix Both the Laplacian matrix and the adjacency matrix can fully represent the graph structure. For convenience, we use a general matrix F M ∈ R N ×N to represent either of them. Note that, since the adjacency matrix is eligible in both directed and undirected graph, it is more common in the GSP literatures. Thus, the generalized GSP is based on the adjacency matrix [2] and the representing matrix refers to the adjacency matrix in this paper unless specified otherwise.
With the graph representation F M and the signal vector s, the graph shifting is defined as Here, the matrix F M could be interpreted as a graph filter whose functionality is to shift the signals along link directions.
Taking the cyclic graph shown in Fig. 2 as an example, its adjacency matrix is a shifting matrix Typically, the shifted signal over the cyclic graph is calculated as s = F M s = [s N s 1 · · · s N −1 ] T , which shifts the signal at each node to its next node. The graph spectrum space, also called the graph Fourier space, is defined based on the eigenspace of F M . Assume that the eigen-decomposition of F M is The frequency components are defined by the eigenvectors of F M and the frequencies are defined with respect to eigenvalues. The corresponding graph Fourier transform is defined asŝ With the definition of the graph Fourier space, the traditional signal processing and learning tasks, such as denoising [29] and classification [72], could be solved within the GSP framework. More details about the specific topics of GSP, such as the frequency analysis, filter design, and spectrum representation have been discussed in [5], [50], [84].

B. Introduction of Hypergraph
We begin with the definition of hypergraph and its possible representations.
elements called vertices and E = {e 1 , ..., e K } is a set of non-empty multi-element subsets of V called hyperedges. Let M = max{|e i | : e i ∈ E} be the maximum cardinality of hyperedges, shorted as m.c.e(H) of H.
In a general hypergraph H, different hyperedges may contain different numbers of nodes. The m.c.e(H) denotes the number of vertices in the largest hyperedge. An example of a hypergraph with 7 nodes, 3 hyperedges and m.c.e = 3 is shown in Fig. 3.
From the definition, we see that a normal graph is a special case of a hypergraph if M = 2. The hypergraph is a natural extension of the normal graph to represent high-dimensional interactions. To represent a hypergraph mathematically, there are two major methods based on matrix and tensor respectively. In the matrix-based method, a hypergraph is represented by a matrix G ∈ R N ×E where E equals the number of hyperedges. The rows of the matrix represent the nodes, and the columns represent the hyperedges [15]. Thus, each element in the matrix indicates whether the corresponding node is involved in the particular hyperedge. Although such a matrix-based representation is simple in formation, it is hard to define and implement signal processing directly as in GSP by using the matrix G. Unlike the matrix-based method, tensor has better flexibility in describing the structures of the high-dimensional graphs [39]. More specifically, tensor can be viewed as an extension of matrix into high-dimensional domains. The adjacency tensor, which indicates whether nodes are connected, is a natural hypergraph counterpart to the adjacency matrix in the normal graph theory [49]. Thus, we prefer to represent the hypergraphs using tensors. In Section III-A, we will provide more details on how to represent the hypergraphs and signals in tensor forms.

C. Tensor Basics
Before we introduce our tensor-based HGSP framework, let us introduce some tensor basics to be used later. Tensors can effectively represent high-dimensional graphs [42]. Generally speaking, tensors can be interpreted as multi-dimensional arrays. The order of a tensor is the number of indices needed to label a component of that array [20]. For example, a thirdorder tensor has three indices. In fact, scalars, vectors and matrices are all special cases of tensors: a scalar is a zerothorder tensor; a vector is a first-order tensor; a matrix is a second-order tensor; and an M -dimensional array is an M thorder tensor [10].
Below are some useful definitions and operations of tensor related to the proposed HGSP framework.

1) Symmetric and Diagonal Tensors:
• A tensor is super-symmetric if its entries are invariant under any permutation of their indices [41]. For example, a third-order T ∈ R I×I×I is super-symmetric if its entries t ijk 's satisfy t ijk = t jik = t kij = t kji = t jik = t jki i, j, k = 1, · · · , I. (5) Analysis of super-symmetric tensors, which is shown to be bijectively related to homogeneous polynomials, could be found in [43], [44].
For example, a third-order T ∈ R I×I×I is super-diagonal if its entries t iii = 0 for i = 1, 2, · · · I, while all other entries are zero. 2) Tensor Operations: Tensor analysis is developed based on tensor operations. Some tensor operations are commonly used in our HGSP framework [46]- [48].
• The tensor outer product between an P th-order tensor U ∈ R I1×I2×...×I P with entries u i1...i P and an Qth- ..×I P ×J1×J2×...×J Q is an (P + Q)-th order tensor, whose entries are calculated by The major use of the tensor outer product is to construct a higher order tensor with several lower order tensors. • The n-mode product between a tensor U ∈ R I1×I2×···×I P and a matrix V ∈ R J×In is denoted by W = U × n V ∈ R I1×I2×···×In−1×J×In+1×···×I P . Each element in W is defined as where the main function is to adjust the dimension of a specific order. For example, in Eq. (7), the dimension of the nth order of U is changed from I n to J. • The Kronecker product of matrices U ∈ R I×J and V ∈ R P ×Q is defined as to generate an IP × JQ matrix. • The Khatri-Rao product between U ∈ R I×K and V ∈ R J×K is defined as • The Hadamard product between U ∈ R P ×Q and V ∈ R P ×Q is defined as 3) Tensor Decomposition: Similar to the eigendecomposition for matrix, tensor decomposition analyzes tensors via factorization. The CANDECOMP/PARAFAC (CP) decomposition is a widely used method, which factorizes a tensor into a sum of component rank-one tensors [20], [45]. For example, a third order tensor T ∈ R I×J×K is decomposed into where a r ∈ R I , b r ∈ R J , c r ∈ R K and R is a positive integer known as rank, which leads to the smallest number of rank-one tensors in the decomposition. The process of CP decomposition for a third-order tensor is illustrated in Fig. 4. There are several extensions and alternatives of the CP decomposition. For example, the orthogonal-CP decomposition [22] decomposes the tensor using an orthogonal basis. For an , it can be decomposed by the orthogonal-CP decomposition as where λ r ≥ 0 and the orthogonal basis is a (i) r ∈ R N for 1 ≤ i ≤ M . More specifically, the orthogonal-CP decomposition has a similar form to the eigen-decomposition when M = 2 and T is super-symmetric.
4) Tensor Spectrum: The eigenvalues and spectral space of tensors are significant topics in tensor algebra. The research of tensor spectrum has achieved great progress in recent years. It will take a large volume to cover all the properties of the tensor spectrum. Here, we just list some helpful and relevant literatures. In particular, Lim and the others developed theories of eigenvalues, eigenvectors, singular values, and singular vectors for tensors based on a constrained variational approach such as the Rayleigh quotient [86]. Qi and the others in [21], [85] presented a more complete discussion of tensor eigenvalues by defining two forms of tensor eigenvalues, i.e., the E-eigenvalue and the H-eigenvalue. Chang and the others [41] further extended the work of [21], [85]. Other works including [87], [88] further developed the theory of tensor spectrum.

III. DEFINITIONS FOR HYPERGRAPH SIGNAL PROCESSING
In this section, we introduce the core definitions used in our HGSP framework.

A. Algebraic Representation of Hypergraphs
The traditional GSP mainly relies on the representing matrix of a graph. Thus, an effective algebraic representation is also helpful in developing a novel HGSP framework. As we mentioned in Section II-C, tensor is an intuitive representation for high-dimensional graphs. In this section, we introduce the algebraic representation of hypergraphs based on tensors.
Similar to the adjacency matrix whose 2-D entries indicate whether and how two nodes are pairwise connected by a simple edge, we adopt an adjacency tensor whose entries indicate whether and how corresponding subsets of M nodes are connected by hyperedges to describe hypergraphs [16].
Suppose that e l = {v l1 , v l2 , · · · , v lc } ∈ E is a hyperedge in H with the number of vertices c ≤ M . Then, e l is represented by all the elements a p1,··· ,p M 's in A, where a subset of c indices from {p 1 , p 2 , · · · , p M } are exactly the same as {l 1 , l 2 , · · · , l c } and the other M − c indices are picked from {l 1 , l 2 , · · · , l c } randomly. More specifically, these elements a p1,··· ,p M 's describing e l are calculated as Meanwhile, the entries, which do not correspond to any hyperedge e ∈ E, are zeros.
Obviously, when the hypergraph degrades to the normal graph with c = M = 2, the weights of edges are calculated as one, i.e., a ij = a ji = 1 for an edge e = (i, j) ∈ E. Then, the adjacency tensor is the same as the adjacency matrix. To understand the physical meaning of the adjacency tensor and its weight, we start with the M -uniform hypergraph with N nodes, where each hyperedge has exactly M nodes [51]. Since each hyperedge has an equal number of nodes, all hyperedges follow a consistent form to describe an Mway relationship with m.c.e = M . Obviously, such Mway relationships can be represented by an M th-order tensor If the weight is nonzero, the hyperedge exists; otherwise, the hyperedge does not exist. Taking the 3-uniform hypergraph in Fig. 5(a) as an example, the hyperedge e 1 is characterized by a 146 = a 164 = a 461 = a 416 = a 614 = a 641 = 0, the hyperedge e 2 is characterized by a 237 = a 327 = a 732 = a 723 = a 273 = a 372 = 0, and e 3 is represented by a 567 = a 576 = a 657 = a 675 = a 756 = a 765 = 0. All other entries in A are zero. Note that, all the hyperedges in an M -uniform hypergraph has the same weight. Different hyperedges are distinguished by the indices of the entries. More specifically, similarly as a ij in the adjancency matrix implies the connection direction from node v j to node v i in GSP, an entry a i1,i2,··· ,i M characterizes one direction of the hyperedge e = {v i1 , v i2 , · · · , v i M } with node v i M as the source and node v i1 as the destination.
However, for a general hypergraph, different hyperedges may contain different numbers of nodes. For example, in the hypergraph of Fig. 5(b), the hyperedge e 2 only contains two nodes. How to represent the hyperedges with the number of nodes below m.c.e = M may become an issue. To represent such a hyperedge e l = {v l1 , v l2 , ..., v lc } ∈ E with the number of vertices c < M in an M th-order tensor, we can use entries a i1,i2,··· ,i M , where a subset of c indices are the same as {l 1 , · · · , l c } (possibly a different order) and the other M − c indices are picked from {l 1 , · · · , l c } randomly. This process can be interpreted as generlaizing the hyperedge with c nodes to a hyperedge with M nodes by duplicating M −c nodes from the set {v l1 , · · · , v lc } randomly with possible repetitions. For example, the hyperedge e 2 = {v 2 , v 3 } in Fig. 5(b) can be represented by the entries a 233 = a 323 = a 332 = a 322 = a 223 = a 232 in the third-order tensor A, which could be interpreted as generalizing the original hyperedge with c = 2 to hyperedges with M = 3 nodes as Fig. 6. We can use Eq. (14) as a generalization coefficient of each hyperedge with respect to permutation and combination [16]. More specifically, for the adjacency tensor of the hypergraph in Fig. 5(b), the entries are calculated as a 146 = a 164 = a 461 = a 416 = a 614 = a 641 = a 567 = a 576 = a 657 = a 675 = a 756 = a 765 = 1 2 , a 233 = a 323 = a 332 = a 322 = a 223 = a 232 = 1 3 , where the remaining entries are set to zeros. Note that, the weight is smaller if the original hyperedge has fewer nodes. More generally, based on the definition of adjacency tensor and Eq. (14), we can easily obtain the following property regarding the hyperedge weight.
This property shows that the edgeweight of a hyperedge gets larger as it involves more nodes. It can help identify the length of each hyperedge based on the weights in the adjacency tensor. Moreover, the edgeweights of two hyperedges with the same number of nodes are the same. Different hyperedges with the same number of nodes are distinguished by their indices of entries in an adjacency tensor.
Then, the Laplacian tensor of the hypergraph H is defined as follows [16].
We see that both the adjacency and Laplacian tensors of a hypergraph H are super-symmetric. Moreover, when m.c.e(H) = 2, they have similar forms to the adjacency and Laplacian matrices of undirected graphs respectively. Similar to GSP, we use an M th-order N -dimension tensor F as a general representation of a given hypergraph H for convenience. As the adjacency tensor is more general, the representing tensor F refers to the adjacency tensor in this paper unless specified otherwise.

B. Hypergraph Signal and Signal Shifting
Based on the tensor representation of hypergraphs, we now provide definitions for the hypergraph signal. In the traditional GSP, each signal element is related to one node in the graph. Thus, the graph signal in GSP is defined as an N -length vector if there are N nodes in the graph. Recall that the representing matrix of a normal graph can be treated as a graph filter, for which the basic form of the filtered signal is defined in Eq.
(1). Thus, we could extend the definitions of the graph signal and signal shifting from the traditional GSP to HGSP based on the tensor-based filter implementation.
In HGSP, we also relate signal element to one node in the hypergraph. Naturally, we can define the original signal as an N -length vector if there are N nodes. Similarly as in GSP, we define the hypergraph shifting based on the representing tensor F. However, since tensor F is of M -th order, we need an (M −1)-th order signal tensor to work with the hypergraph filter F, such that the filtered signal is also an N -length vector as the original signal. For example, for a two-step polynomial filter shown as Fig. 7, the signals s, s , s should all be in the same dimension and order. For the input and output signals in a HGSP system to have a consistent form, we define an alternative form of the hypergraph signal as below.
where each entry in position Note that the above hypergraph signal comes from the original signal. They are different forms of the same signal, which reflect the signal properties in different dimensions. For example, a second-order hypergraph signal highlights the properties of the two-dimensional signal components s i s j while the original signal directly emphasizes more about the one-dimension properties. We will discuss in greater details on the relationship between the hypergraph signal and the original signal in Section III-D.
With the definition of hypergraph signals, let us define the original domain of signals for convenience before we step into the signal shifting. Similarly as that the signals lie in the time domain for DSP, we have the following definition of hypergraph vertex domain. The hypergraph vertex domain is a counterpart of time domain in HGSP. The signals are analyzed based on the structure among vertices in a hypergraph.
Next, we discuss how the signals shift on the given hypergraph. Recall that, in GSP, the signal shifting is defined by the product of the representing matrix F M ∈ R N ×N and the signal vector s ∈ R N , i.e., s = F M s. Similarly, we define the hypergraph signal shifting based on its tensor F and the hypergraph signal Definition 6 (Hypergraph shifting). The basic shifting filter of hypergraph signals is defined as the direct contraction between the representing tensor F and the hypergraph signals where each element of the filter output is given by Since the hypergraph signal contracts with the representing tensor in M − 1 order, the one-time filtered signal s (1) is an N -length vector, which has the same dimension as the original signal. Thus, the block diagram of a hypergraph filter with F can be shown as Fig. 8.
Let us now consider the functionality of the hypergraph filter, as well as the physical insight of the hypergraph shifting. In GSP, the functionality of the filter F M is simply to shift the signals along the link directions. However, interactions inside the hyperedge are more complex as it involves more than two nodes. In Eq. (19), we see that the filtered signal in v i equals the summation of the shifted signal components in all hyperedges containing node v i , where f ij1···j M −1 is the weight for each involved hyperedge and {s j1 , · · · , s j M −1 } are the signals in the generalized hyperedges excluding s i . Clearly, the hypergraph shifting multiplies signals in the same hyperedge of node v i together before delivering the shift to a certain node v i . Taking the hypergraph in Fig. 5(a) as an example, node v 7 is included in two hyperedges, e 2 = {v 2 , v 3 , v 7 } and e 3 = {v 5 , v 6 , v 7 }. According to Eq. (19), the shifted signal in node v 7 is calculated as  where f 732 = f 723 is the weight of the hyperedge e 2 and f 756 = f 765 is the weight for the hyperedge e 3 in the adjacency tensor F.
As the entry a ji in the adjacency matrix of a normal graph indicates the link direction from the node v i to the node v j , the entry f i1···i M in the adjacency tensor similarly indicates the order of nodes in a hyperedge as where v i1 is the destination and v i M is the source. Thus, the shifting by Eq. (20) could be interpreted as shown in Fig. 9(a). Since there are two possible directions from nodes {v 2 , v 3 } to node v 7 in e 2 , there are two components shifted to v 7 , i.e., the first two terms in Eq. (20). Similarly, there are also two components shifted by the hyperedge e 3 , i.e., the last two terms in Eq. (20). To illustrate the hypergraph shifting more explicitly, Fig.  9(b) shows a diagram of signal shifting to a certain node in an M -way hyperedge. From Fig. 9(b), we see that the graph shifting in GSP is a special case of the hypergraph shifting, where M = 2. Moreover, there are K = (M − 1)! possible directions for the shifting to one specific node in an M -way hyperedge.

C. Hypergraph Spectrum Space
We now provide the definitions of the hypergraph Fourier space, i.e., the hypergraph spectrum space. In GSP, the graph Fourier space is defined as the eigenspace of its representing matrix [5]. Similarly, we define the Fourier space of HGSP based on the representing tensor F of a hypergraph, which characterizes the hypergraph structure and signal shifting. For an M -th order N -dimension tensor F, we can apply the orthogonal-CP decomposition [22] to write with basis f Generally, we have the rank R ≤ N in a hypergraph. We will discuss how to construct the remaining f i , R < i ≤ N , for the case of R < N later in Section III-F. Now, by plugging Eq. (22) into Eq. (18), the hypergraph shifting can be written with the N basis f i 's as where < f r , s >= (f T i s) is the inner product between f r and s, and From Eq. (23d), we see that the shifted signal in HGSP is in a similar decomposed to Eqs. (3) and (4) for GSP. The first two parts in Eq. (23d) work like V −1 M Λ of the GSP eignen-decomposition, which could be interpreted as inverse Fourier transform and filter in the Fourier space. The third part can be understood as the hypergraph Fourier transform of the original signal. Hence, similarly as in GSP, we can define the hypergraph Fourier space and Fourier transform based on the orthogonal-CP decomposition of F.
Compared to GSP, if M = 2, the HGFT has the same form as the traditional GFT. In addition, since f r is the orthogonal basis, we have According to [21], a vector x is an E-eigenvector of an M thorder tensor A if Ax [M −1] = λx exists for a constant λ. Then, we obtain the following property of the hypergraph spectrum.
Property 2. The hypergraph spectrum pair (λ r , f r ) is an Eeigenpair of the representing tensor F.
Recall that the spectrum space of GSP is the eigenspace of the representing matrix F M . Property 2 shows that HGSP has a consistent definition in the spectrum space as that for GSP.

D. Relationship between Hypergraph Signal and Original Signal
With HGFT defined, let us discuss more about the relationship between the hypergraph signal and the original signal in the Fourier space to understand the HGFT better. From Eq. (24b), the hypergraph signal in the Fourier space is written aŝ which can be further decomposed aŝ where * denotes Hadamard product. From Eq. (27), we see that the hypergraph signal in the hypergraph Fourier space is M − 1 times Hadamard product of a component consisting of the hypergraph Fourier basis and the original signal. More specifically, this component works as the original signal in the hypergraph Fourier space, which is defined ass where Recall the definitions of the hypergraph signal and vertex domain in Section III-B, we have the following property.  Then, we could establish a connection between the original signal and the hypergraph signal in the hypergraph Fourier domain by the HGFT and inverse HGFT (iHGFT) as shown in Fig. 10. Such a relationship leads to some interesting properties and makes the HGFT implementation more straightforward, which will be further discussed in Section III-F and Section III-G, respectively.

E. Hypergraph Frequency
As we now have a better understanding of the hypergraph Fourier space and Fourier transform, we can discuss more about the hypergraph frequency and its order. In GSP, the graph frequency is defined with respect to the eigenvalues of the representing matrix F M and ordered by the total variation [5]. Similarly, in HGSP, we define the frequency relative to the coefficients λ i from the orthogonal-CP decomposition. We order them by the total variation of frequency components f i over the hypergraph. The total variation of a general signal component over a hypergraph is defined as follows.
Definition 8 (Total variation over hypergraph). Given a hypergraph H with N nodes and the normalized representing tensor F norm = 1 λmax F, together with the original signal s, the total variation over the hypergraph is defined as the total differences between the nodes and their corresponding neighbors in the perspective of shifting, i.e., We adopt the l 1 -norm here only as an example of defining the total variation. Other norms may be more suitable depending on specific applications. Now, with the definition of total variation over hypergraphs, the frequency in HGSP is ordered by the total variation of the corresponding frequency component f r , i.e., where f norm r(1) is the output of one-time shifting for f r over the normalized representing tensor.
From Eq. (29a), we see that the total variation describes how much the signal component changes from a node to its neighbors over the hypergraph shifting. Thus, we have the following definition of hypergraph frequency.
Definition 9 (Hypergraph frequency). Hypergraph frequency describes how oscillatory the signal component is with respect to the given hypergraph. A frequency component f r is associated with a higher frequency if the total variation of this frequency component is larger.
Note that, the physical meaning of graph frequency was stated in GSP [2]. Generally, the graph frequency is highly related to the total variation of the corresponding frequency component. Similarly, the hypergraph frequency also relates to the corresponding total variation. We will discuss more about the interpretation of the hypergraph frequency and its relationships with DSP and GSP later in Section IV-A, to further solidate our hypergraph frequency definition.
Based on the definition of total variation, we describe one important property of TV(f r ) in the following theorem.

Theorem 1. Define a supporting matrix
With the normalized representing tensor F norm = 1 λmax F, the total variation of hypergraph spectrum f r is calculated as Proof: For hypergraph signals, the output of one-time shifting of f r is calculated as Based on the normalized F norm , we have f norm r(1) = λr λmax f r . It is therefore easy to obtain Eq. (32c) from Eq. (32a). To obtain Eq. (32b), we have It is clear that Eq. (32b) is the same as Eq. (32c). Since λ is real and nonnegative, we have Obviously, TV(f i ) > TV(f j ) iff λ i < λ j . Theorem 1 shows that the supporting matrix P s can help us apply the total variation more efficiently in some real applications. Moreover, it provides the order of frequency according to the coefficients λ i 's with the following property.

Property 4.
A smaller λ is related to a higher frequency in the hypergraph Fourier space, where its corresponding spectrum basis is called a high frequency component.

F. Signals with Limited Spectrum Support
With the order of frequency, we define the bandlimited signals as follows.
Definition 10 (Bandlimited signal). Order the coefficients as λ = [λ 1 · · · λ N ] where λ 1 ≥ · · · ≥ λ N ≥ 0, together with their corresponding f r 's. A hypergraph signal The smallest K is defined as the bandwidth and the corresponding boundary is defined as Note that, a larger λ i corresponds to a lower frequency as we mentioned in Property 4. Then, the frequency are ordered from low to high in the definition above. Moreover, we use the index K instead of the coefficient value λ to define the bandwidth for the following reasons: • Identical λ's in two diferent hypergraphs do not refer to the same frequency. Since each hypergraph has its own adjacency tensor and spectrum space, the comparison of multiple spectrum pairs (λ i , f i )'s is only meaningful within the same hypergraph. Moreover, there exists a normalization issue in the decomposition of different adjacency tensors. Thus, it is not meaningful to compare λ k 's across two different hypergraphs. • Since λ k values are not continuous over k, different frequency cutoffs of λ may lead to the same bandlimited space. For example, suppose that λ k = 0.5 and λ k+1 = 0.8. Then, λ = 0.6 and λ = 0.7 would lead to the same cutoff in the frequency space, which makes bandwidth definition non-unique. As we discussed in Section III-D, the hypergraph signal is the Hadamard product of the original signal in the frequency domain. Then, we have the following property of bandwidth. This property allows us to analyze the spectrum support of the hypergraph signal by looking into the original signal with lower complexity. Recall that we can add f i by using zero coefficients λ i when R < N as mentioned in Section III-C. The added basis should not affect the HGFT signals in Fourier space. According to the structure of bandlimited signal, we need the added f i could meet the following conditions: (1) f i ⊥ f p for p = i; (2) f T i · s → 0; and (3) |f i | = 1.

G. Implementation and Complexity
We now consider the implementation and complexity issues of HGFT. Similar to GFT, the process of HGFT consists of two steps: decomposition and execution. The decomposition is to calculate the hypergraph spectrum basis, and the execution transforms signals from the hypergraph vertex domain into the spectrum domain.
• The calculation of spectrum basis by the orthogonal-CP decomposition is an important preparation step for HGFT. A straightforward algorithm would decompose the representing tensor F with the spectrum basis f i 's and coefficients λ i 's as in Eq. (22). Efficient tensor decomposition is an active topic in both fields of mathematics and engineering. There are a number of methods for CP decomposition in the literature. In [52], [56], motivated by the spectral theorem for real symmetric matrices, orthogonal-CP decomposition algorithms for symmetric tensors are developed based polynomial equations. In [22], Afshar et al. proposed a more general decomposition algorithm for spatio-temporal data. Other works, including [53]- [55], tried to develop faster decomposition methods for signal processing and big data applications. The rapid development of tensor decomposition and the advancement of computation ability will benefit the efficient derivation of hypergraph spectrum.

IV. DISCUSSIONS AND INTERPRETATIONS
In this section, we focus on the insights and physical meaning of frequency to help interpret the hypergraph spectrum space. We also consider the relationships between HGSP and other existing works to better understand the HGSP framework.

A. Interpretation of Hypergraph Spectrum Space
We are interested in an intuitive interpretation of the hypergraph frequency and its relations with the DSP and GSP frequencies. We start with the frequency and the total variation in DSP. In DSP, the discrete Fourier transform of a sequence s n is given byŝ k = N −1 n=0 s n e −j 2πkn N and the frequency is defined as ν n = n N , n = 0, 1, · · · , N − 1. From [35], we can easily summarize the following conclusions: • ν n : 1 < n < N 2 − 1 corresponds to a continuous time signal frequency n N f s ; • ν n : N 2 + 1 < n < N − 1 corresponds to a continuous time signal frequency −(1 − n N )f s ; • ν N 2 corresponds to f s /2; • n = 0 corresponds to frequency 0. Here, f s is the critical sampling frequency. The highest frequency occurs at n = N 2 . The total variation in DSP is defined as the differences among the signals over time [57], i.e., When we perform the eigen-decomposition of C N , we see that the eigenvalues are λ n = e −j 2πn N with eigenvector f n , 0 ≤ n ≤ N − 1. More specifically, the total variation of the frequency component f n is calculated as which increases with n for n ≤ N 2 before decreasing with n for N 2 < n ≤ N − 1. Obviously, the total variations of frequency components have a one-to-one correspondence to frequencies in the order of their values. If the total variation of a frequency component is larger, the corresponding frequency with the same index n is higher. It also has clear physical meaning, i.e., a higher frequency component changes faster over time, which implies a larger total variation. Thus, we could also use the total variation of a frequency component to characterize its frequency in DSP.
Let us now consider the total variation and frequency in GSP, where the signals are analyzed in the graph vertex domain instead of the time domain. Similar to the fact that the frequency in DSP describes the rate of signal changes over time, the frequency in GSP illustrates the rate of signal changes over vertex [5]. Likewise, the total variation of the graph Fourier basis defined according to the adjacency matrix F M could be used to characterize each frequency. Since GSP handles signals in the graph vertex domain, the total variation of GSP is defined as the differences between all the nodes and their neighbors, i.e., where F M norm = 1 |λmax| F M . If the total variation of the frequency component f Mi is larger, it means the change over the graph between neighborhood vertices is faster, which indicates a higher graph frequency. Note that, once the graph is undirected, i.e., the eigenvalues are real numbers, the frequency decreases with the increase of the eigenvalue similar as HGSP in Section III-E; otherwise, if the graph is undirected, i.e., the eigenvalues are complex, the frequency changes as shown in Fig. 11, which is consistency with the changing pattern of DSP frequency [5].
We now turn to our HGSP framework. Like GSP, HGSP analyzes signals in the hypergraph vertex domain. Different from normal graphs, each hyperege in HGSP connects more than two nodes. The neighbors of a vertex v i include all the nodes in the hyperedges containing v i . For example, if there exists a hyperedge e 1 = {v 1 , v 2 , v 3 }, nodes v 2 and v 3 are both neighbors of node v 1 . As we mentioned in Section III-E, the total variation of HGSP is defined as the difference between continuous signals over the hypergraph, i.e., the difference between the signal components and their respective shifted versions: where F norm = 1 λmax F. Similar to DSP and GSP, pairs of (λ i , f i ) in Eq. (22) characterize the hypergraph spectrum space. A spectrum component with a larger total variation represents a higher frequency component, which indicates faster changes over the hypergraph. Note that, as we mentioned in Section III-E, the total variation is larger and the frequency is higher if the corresponding λ is smaller because we usually talk about undirected hypergraph and the λ's are real in the tensor decomposition. To illustrate it more clearly, we consider a hypergraph with 9 nodes, 5 hyperedges, and m.c.e = 3 as an example, shown in Fig. 12. As we mentioned before, a smaller λ indicates a higher frequency in HGSP. Hence, we see that the signals have more changes on each vertex if the frequency is higher.

B. Connections to other Existing Works
We now discuss the relationships between the HGSP and other existing works.
1) Graph Signal Processing: One of the motivations for developing HGSP is to develop a more general framework for signal processing in high-dimensional graphs. Thus, GSP should be a special case of HGSP. We illustrate the GSP-HGSP relationship as follows.
• Graphical models: GSP is based on normal graphs [2], where each simple edge connects exactly two nodes; HGSP is based on hypergraphs, where each hyperedge could connect more than two nodes. Clearly, the normal graph is a special case of hypergraph, for which the m.c.e equals two. More specifically, a normal graph is a 2uniform hypergraph [58]. Hypergraph provides a more general model for multi-way relationships while normal graphs are only able to model two-way relationship. For example, a 3-uniform hypergraph is able to model the three-way interaction among users in a social network [59]. As hypergraph is a more general model for highdimensional interactions, HGSP is also more powerful for high-dimensional signals. • Algebraic models: HGSP relies on tensors while GSP relies on matrices, which are second-order tensors. Benefiting from the generality of tensor, HGSP is broadly applicable in high-dimensional data analysis. • Signals and signal shifting: In HGSP, we define the hypergraph signal as M − 1 times tensor outer product of the original signal. More specifically, the hypergraph signal is the original signal if M = 2. Basically, the hypergraph signal is the same as the graph signal if each hyperedge has exactly two nodes. Also shown in Fig.  9(b) of Section III-C, graph shifting is a special case of hypergraph shifting when M = 2. • Spectrum properties: In HGSP, the spectrum space is defined over the orthogonal-CP decomposition in terms of the basis and coefficients, which are also the E-eigenpairs of the representing tensor [60], shown in Eq. (25). In GSP, the spectrum space is defined as the matrix eigenspace. Since the tensor algebra is an extension of matrix, the HGSP spectrum is also an extension of the GSP spectrum. For example, as discussed in Section III, GFT is the same as HGFT when M = 2.
Overall, HGSP is an extension of GSP, which is both more general and novel. The purpose of developing the HGSP framework is to facilitate more interesting signal processing tasks that involve high-dimensional signal interactions.
2) Higher-Order Statistics: Higher-order statistics (HOS) has been effectively applied in signal processing [61], [62], which can analyze the multi-way interactions of signal samples and have found successes in many applications, such as blind feature detection [63], decision [64], and signal classifications [65]. In HOS, the kth-order cumulant of random variables x = [x 1 , · · · , x k ] T is defined [66] based on the coefficients of v = [v 1 , · · · , v k ] T in the Talyor series expansion of cumulantgernerating function, i.e., It is easy to see that HGSP and HOS are related in highdimensional signal processing. They can be both represented by tensor. For example, in the multi-channel problems of [67], the 3rd-order cumulant C = {C yi,yj ,yz (t, t 1 , t 2 )} of zeromean signals can be represented as a multilinear array, e.g., C yi,yj ,yz (t, t 1 , t 2 ) = E{y i (t)y j (t + t 1 )y z (t + t 2 )}, (41) which is essentially a third-order tensor. More specifically, if there are k samples, the cumulant C can be represented as an p k -element vector, which is the flattened signal tensor similar to the n-mode flattening of HGSP signals.
Although both HOS and HGSP are high-dimensional signal processing tools, they focus on complementary aspects of the signals. Specifically, HGSP aims to analyze signals over the high-dimensional vertex domain, while HOS focuses on the statistical domain. In addition, the forms of signal combination are also different, where HGSP signals are based on the hypergraph shifting defined as in Eq. (19), whereas HOS cumulants are based on the statistical average of shifted signal products.
3) Learning over Hypergraphs: Hypergraph learning is another tool to handle structured data and sometimes uses similar techniques to HGSP. For example, the authors of [68] proposed an alternative definition of hypergraph total variation and design algorithms in accordance for classification and clustering problems. In addition, hypergraph learning also has its own definition of the hypergraph spectrum space. For example, [36], [37] represented the hypergraphs using a graphlike similarity matrix and defined a spectrum space as the eigenspace of this similarity matrix. Other works considered different aspects of hypergraph, including the hypergraph Laplacian [69] and hypergraph lifting [70].
The HGSP framework exhibits features different from hypergraph learning: 1) HGSP defines a framework that generalizes the classical digital signal processing and traditional graph signal processing; 2) HGSP applies different definitions of hypergraph characteristics such as the total variation, spectrum space, and Laplacian; 3) HGSP cares more about the spectrum space while learning focuses more on data; 4) As HGSP is an extension of DSP and GSP, it is more suitable to handle detailed tasks such as compression, denoising, and detection. All these features make HGSP a different technical concept from hypergraph learning.
V. TOOLS FOR HYPERGRPH SIGNAL PROCESSING In this section, we introduce several useful tools built within the framework of HGSP.

A. Sampling Theory
Sampling is an important tool in data analysis, which selects a subset of individual data points to estimate the characteristics of the whole population [89]. Sampling plays an important role in applications such as compression [23] and storage [90]. Similar to sampling signals in time, the HGSP sampling theory can be developed to sample signals over the vertex domain. We now introduce the basics of HGSP sampling theory for lossless signal dimension reduction.
To reduce the size of a hypergraph signal s [M −1] , there are two main approaches: 1) to reduce the dimension of each order; and 2) to reduce the number of orders. Since the reduction of order breaks the structure of hypergraph and cannot always guarantee perfect recovery, we adopt the dimension reduction of each order. To change the dimension of a certain order, we can use the n-Mode product. Since each order of the complex signal is equivalent, the n-Mode product operators of each order are the same. Then, the sampling operation of the hypergraph signal is defined as follows: Definition 11 (Sampling and Interpolation). Suppose that Q is the dimension of each sampled order. The sampling operation is defined as where the sampling operator is U ∈ R Q×N to be defined later, and the sampled signal is s

The interpolation operation is defined by
where the interpolation operator is T ∈ R N ×Q to be defined later.
As presented in Section III, the hypergraph signal and original signal are different forms of the same data. They may have similar properties in structures. To derive the sampling theory for perfect signal recovery efficiently, we first consider the sampling operations of the original signal.
Definition 12 (Sampling original signal). Suppose an original K-bandlimited signal s ∈ R N is to be sampled into s Q ∈ R Q , where q = {q 1 , · · · , q Q } denotes the sequence of sampled indices and q i ∈ {1, 2, · · · , N }. The sampling operator U ∈ R Q×N is a linearing mappling from R N to R Q , defined by and the interpolation operator T ∈ R N ×Q is a linear mapping from R Q to R N . Then, the sampling operation is defined by and the interpolation operation is defined by Analyzing the structure of the sampling operations, we have the following properties.  . Suppose q = {q 1 , q 2 , · · · , q Q } is the places of non-zero U jqj 's, we have As a result, we have U ji = δ[i − q j ], which is the same as the sampling operator for the original signal. For the interpolation operator, the proof is similar and hence omitted. Given Theorem 2, we only need to analyze the operations of the original signal in the sampling theory. Next, we discuss the conditions for perfect recovery. For the original signal, we have the following property.
Lemma 1. Suppose that s ∈ R N is a K-bandlimited signal. Then, we have where F T [K] = [f 1 , · · · , f K ] ands [K] ∈ R K consists of the first K elements of the original signal in the frequency domain, i.e.,s.
Proof: Since s is K-bandlimited,s i = f T i s = 0 when i > K. Then, according to Eq.(28), we have where This lemma implies that the first K frequency components carry all the information of the original signal. Since the hypergraph signal and the original signal share the same sampling operators, we can reach a similar conclusion for perfect recovery as [23], [24], given in the following theorem. Proof: To prove the theorem, we show that TU is a projection operator and T spans the space of the first K eigenvectors. From Lemma 1 and s = Ts Q , we have As a result, rank(Zs Q ) = rank(s [K] ) = K. Hence, we conclude that K ≤ Q.
Next, we show that TU is a projection by proving that TU · TU = TU. Since we have Q ≥ K and We have Hence, TU is a projection operator. For the spanning part, the proof is the same as that in [23]. Theorem 3 shows that a perfect recovery is possible for a bandlimited hypergraph signal. We now examine some interesting properties of the sampled signal.
From the previous discussion, we haves [K] = Zs Q , which has a similar form to HGFT, where Z can be treated as the Fourier transform operator. Suppose that Q = K and Z = [z 1 · · · z K ] T . We have the following first-order difference property.

Theorem 4. Define a new hypergraph by
, it holds that Proof: Let the diagonal matrix Σ [K] consist of the first K coefficients {λ 1 , . . . , λ K }. Since ZUF T [K] = I K , we have Since Theorem 4 shows that the sampled signals form a new hypergraph that preserves the information of the one-time shifting filter over the original hypergraph. For example, the left-hand side of Eq. (54) represent the difference between the sampled signal and the one-time shifted version in the new hypergraph. The right-hand side of Eq. (54) is the difference between a signal and its one-time shifted version in the original hypergraph, together with the sampling operator. That is, the sampled result of the one-time shifting differences in the original hypergraph is equal to the one-time shifting differences in the new sampled hypergraph.

B. Filter Desgin
Filter is an important tool in signal processing applications such as denoising, feature enhancement, smoothing, and classification. In GSP, the basic filtering is defined as s = F M s where F M is the representing matrix [2]. In HGSP, the basic hypergraph filtering is defined in Section III-C as s (1) = Fs [M −1] , which is designed according to the tensor contraction. The HGSP filter is a multilinear mapping [71]. The high-dimensionality of tensors provides more flexibility in designing the HGSP filter.
1) Polynomial Filter based on Representing Tensor: Polynomial filter is one basic form of HGSP filters, with which signals are shifted several times over the hypergraph. An example of polynomial filter is given as Fig. 7 in Section III-B. A k-time shifting filter is defined as (56b) More generally, a polynomial filter is designed as where {α k } are the filter coefficients. Such HGSP filters are based on multilinear tensor contraction, which could be used for different signal processing tasks by selecting specific parameters a and {α i }.
In addition to the general polynomial filter based on hypergraph signals, we provide another specific form of polynomial filter based on the original signals. As mentioned in Section III-E, the supporting matrix P s in Eq. (31) captures all the information of the frequency space. For example, the unnormalized supporting matrix P = λ max P s is calculated as Obviously, the hypergraph spectrum pair (λ r , f r ) is an eigenpair of the supporting matrix P. Moreover, Theorem 1 shows that the total variation of frequency component equals to a function of P, i.e., From Eq. (59), P can be interpreted as a shifting matrix for the original signal. Accordingly, we can design a polynomial filter for the original signal based on the supporting matrix P whose kth-order term is defined as The a-th order polynomial filter is simply given as A polynomial filter over the original signal can be determined with specific choices of a and α.
Let us consider some interesting properties of the polynomial filter for the original signal. Given the kth-order term, we have the following properties. Proof: Therefore, the kth-order term is given as From Lemma 2, we obtain the following property of the polynomial filter for the original signal.
Theorem 5. Let h(·) be a polynomial function. For the polynomial filter H = h(P) for the original signal, the filtered signal satisfies This theorem works as the invariance property of exponential in HGSP, similar to those in GSP and DSP [2]. Eq. (57) and Eq. (60) provide more choices for HGSP polynomial filters in hypergraph signal processing and data analysis. We will give specific examples of practical applications in Section VI.
2) General Filter Design based on Optimization: In GSP, some filters are designed via optimization formulations [2], [72], [73]. Similarly, general HGSP filters can also be designed via optimization approaches. Assume y is the oberserved signal before shifting and s = h(F, y) is the shifted signal by HGSP filter h(·) designed for specific applications. Then, the filter design can be formulated as where F is the representing tensor of the hypergraph and f (·) is a penalty function designed for specific problems. For example, the total variation could be used as a penalty function for the purpose of smoothness. Other alternative penalty functions include the label rank, Laplacian regularization and spectrum.
In Section VI, we shall provide some filter design examples.

VI. APPLICATION EXAMPLES
In this section, we consider several application examples for our newly proposed HGSP framework. These examples illustrate the practical use of HGSP in some traditional tasks, such as filter design and efficient data representation. We also consider problems in data analysis, such as classification and clustering.

A. Data Compression
Efficient representation of signals is important in data analysis and signal processing. Among many applications, data compression attracts significant interests for efficient storage and transmission [74]- [76]. Projecting signals into a suitable orthonormal basis is a widely-used compression method [5]. Within the proposed HGSP framework, we propose a data compression method based on the hypergraph Fourier transform. We can represent N signals in the original domain with C frequency coefficients in the hypergraph spectrum domain. More specifically, with the help of the sampling theory in Section V, we can compress an K-bandlimited signal of N signal points losslessly with K spectrum coefficients.
To test the performance of our HGSP compression and demonstrate that hypergraphs may be a better representation of structured signals than normal graphs, we compare the results of image compression with those from GSP-based compression method [5]. We test over seven small size-16×16 icon images and three size-256 × 256 photo images, shown in Fig. 13.
The HGSP-based image compression method is described as follows. Given an image, we first model it as a hypergraph with the Image Adaptive Neighborhood Hypergraph (IANH) model [26]. To reduce complexity, we pick three closest neighbors in each hyperedge to construct a third-order adjacency tensor. Next, we can calculate the Fourier basis of the adjacency tensor as well as the bandwidth K of the hypergraph signals. Finally, we can represent the original images using C spectrum coefficients with C = K. For a large image, we may first cut it into smaller image blocks before applying HGSP compression to improve speed.
For the GSP-based method in [5], we represent the images as graphs with 1) the 4-connected neighbor model [27], and 2) the distance-based model in which an edge exists only if the spatial distance is below α and the pixel distance is below β. The graph Fourier space and corresponding coefficients in the frequency domain are then calculated to represent the original image.
We use the compression ratio CR= N/C to measure the efficiency of different compression methods. A large CR implies higher compression efficiency. The result is summarized in Table I, from which we can see that our HGSP-based compression method achieves higher efficiency than the GSPbased compression methods.
In addition to the image datasets, we also test the efficiency of HGSP spectrum compression over the MovieLens dataset [83], where each movie data point has rating scores and tags from viewers. Here, we treat scores of movies as signals and construct graph models based on the tag relationships. Similar to the game dataset shown in Fig. 1(b), two movies are connected in a normal graph if they have similar tags. For example, if movies are labeled with 'love' by users, they are connected by an edge. To model the dataset as a hypergraph, we include the movies into one hyperedge if they have similar tags. For convenience and complexity, we set m.c.e = 3. With the graph and hypergraph models, we compress the signals using the sampling method discussed earlier. For lossless compression, our HGSP method is able to use only 7.6% of the samples from the original signals to recover the original dataset by choosing suitable additional basis (see Section III-F). On the other hand, the GSP method requires 98.6% of the samples. We also test the error between the recovered and original signals based on varying percentages of samples. As shown in Fig. 14, the recovery error naturally decreases with more samples. Note that our HGSP method achieves a much better performance once it obtains sufficient number of samples, while GSP error drops slowly. This is due to the first few key HGSP spectrum basis elements carry most of the original information, thereby leading to a more efficient representation for structured datasets.
Overall, hypergraph and HGSP lead to more efficient descriptions of structured data in most applications. With a more suitable hypergraph model and more developed methods, the HGSP framework could be a very new important tool in data compression.

B. Spectral Clustering
Clustering problem is widely used in a variety of applications, such as social network analysis, computer vision, and communication problems. Among many methods, spectral clustering is an efficient clustering method [33], [34]. Modeling the dataset by a normal graph before clustering the data spectrally, significant improvement is possible in structured data [91]. However, such standard spectral clustering methods only exploit pairwise interactions. For applications where the interactions involve more than two nodes, hypergraph spectral clustering should be a more natural choice.
In hypergraph spectral clustering, one of the most important issues is how to define a suitable spectral space. In [36], [37], the authors introduced the hypergraph similarity spectrum for spectral clustering. Before spectral clustering, they first modeled the hypergraph structure into a graph-like similarity matrix. They then defined the hypergraph spectrum based on the eigenspace of the similarity matrix. However, since the modeling of hypergraph with a similarity matrix may result in certain loss of the inherent information, a more efficient spectral space defined directly over hypergraph is more desired as introduced in our HGSP framework. With HGSP, as the hypergraph Fourier space from the adjacency tensor has a similar form to the spectral space from adjacency matrix in GSP, we could develop the spectral clustering method based on the hypergraph Fourier space as in Algorithm 1. and construct a Fourier spectrum matrix S ∈ R N ×E with columns as the leading Fourier basis. 5: Cluster the rows of S into k clusters using k-means clustering. 6: Put node i in partition j if the i-th row is assigned to the j-th cluster. 7: Output: k partitions of the hypergraph dataset.

Algorithm 1 HGSP Fourier Spectral Clustering
To test the performance of the HGSP spectral clustering, we compare the achieved results with those from the hypergraph similarity method (HSC) in [37], using the zoo dataset [30]. To measure the performance, we compute the intra-cluster variance and the average Silhouette of nodes [38]. Since we expect the data points in the same cluster to be closer to each other, the performance is considered better if the intra-cluster variance is smaller. On the other hand, the Silhouette value is a measure of how similar an object is to its own cluster  versus other clusters. A higher Silhouette value means that the clustering configuration is more appropriate.
The comparative results are shown in Fig. 15. Form the test result, we can see that our HGSP method generates a lower variance and a higher Silhouette value. More intuitively, we plot the clusters of animals in Fig. 16. Cluster 2 covers small animals like bugs and snakes. Cluster 3 covers carnivores whereas cluster 7 groups herbivores. Cluster 4 covers birds and Cluster 6 covers fish. Cluster 5 contains the rodents such as mice. One interesting category is cluster 1: although dolphins, sea-lions, and seals live in the sea, they are mammals and are clustered separately from cluster 6. From these results, we see that the HGSP spectral clustering method could achieve better performance and our definition of hypergraph spectrum may be more appropriate for spectral clustering in practice.

C. Classification
Classification problems are important in data analysis. Traditionally, these problems are studied by learning methods [31]. Here, we propose a HGSP-based method to solve the {±1} classification problem, where a hypergraph filter serves as a classifier.
The basic idea adopted for the classification filter design is label propagation (LP), where the main steps are to first construct a transmission matrix and then propagate the label based on the transmission matrix [32]. The label will converge after a sufficient number of shifting steps. Let W be the propagation matrix. Then the label could be determined by the distribution s = W k s. We see that s is in the form of filtered graph signal. Recall that in Section V-B, the supporting matrix P has been shown to capture the properties of hypergraph shifting and total variation. Here, we propose a HGSP classifier based on the supporting matrix P defined in Eq. (58) to generate matrix H = (I + α 1 P)(I + α 2 P) · · · (I + α k P).
Our HGSP classifier is to simply rely on sign[Hs]. The main steps of the propagated LP-HGSP classification method is To test the performance of the hypergraph-based classifier, we implement them over the zoo datasets. We determine whether the animals have hair based on other features, formulated as a {±1} classification problem. We randomly pick different percentages of training data and leave the remaining data as the test set among the total 101 data points. We smooth the curve with 1000 combinations of randomly picked training sets. We compare the HGSP-based method with SVM and label propagation GSP (LP-GSP) methods [2]. In the experiment, we model the dataset as hypergraph or graph based on the distance of data. The threshold of determining the existence of edges is designed to ensure the absence of isolated nodes in the graph. For the label propagation method, we set k = 15. The result is shown in Fig. 17. From the result, we see that the label propagation HGSP method (LP-HGSP) is moderately better than LP-GSP. The graph-based methods, i.e., LP-GSP and LP-HGSP, both perform better than SVM. HGSP and GSP show significant advantages over a small dataset. Although GSP and HGSP classifiers are both model-based, hypergraph-based ones usually perform better than normal graphs-based ones, since hypergraphs give a better description of the structured data in most applications.

D. Denoising
Signals collected in the real world often contain noises. Signal denoising is thus an important application in signal processing. Here, we design a hypergraph filter to implement signal denoising.
As mentioned in Section III, the smoothness of a graph signal, which describes the variance of hypergraph signals, could be measured by the total variation. Assume that the original signal is smooth. We formulate signal denoising as an optimization problem. Suppose that y = s + n is a noisy signal with noise n, and s = h(F, y) is the denoised data by the HGSP filter h(·). The denoising problem could be formulated as an optimization problem: where the second term is the weighted quadratic total variation of the filtered signal s . The denoising problem of Eq. (68) aims to smooth the signal based on the original noisy data y. The first term keeps the denoised signal close to the original noisy signal, whereas the second term tries to smooth the recovered signal. Clearly, the optimized solution of filter design is s = h(F, y) = [I + γ(I − P s ) T (I − P s )] −1 y, where P s = The HGSP-based filter follows a similar idea to GSP-based denoising filter [29]. However, different definitions of the total variation and signal shifting result in different designs of HGSP vs. GSP filters. To test the performances, we compare our method with the basic Wiener filter, Median filter, and GSP-based filter [29] using the image datasets of Fig. 13. We apply different types of noises. To quantify the filter performance, we use the mean square error (MSE) between each true signal and the corresponding signal after filtering. The results are given in Table II. From these results, we can see that, for each type of noise and picking optimized γ for all the methods, our HGSP-based filter out-performs other filters.

E. Other Potential Applications
In addition to the application algorithms discussed above, there could be many other potential applications for HGSP. In this subsection, we suggest several potential applicable datasets and systems for HGSP.
• IoT: With the development of IoT techniques, the system structures become increasingly complex, which makes traditional graph-based tools inefficient to handle the high-dimensional interactions. On the other hand, the hypergraph-based HGSP is powerful in dealing with highdimensional analysis in the IoT system: for example, data intelligence over sensor networks, where hypergraphbased analysis has already attracted significant attentions [92], and HGSP could be used to handle tasks like clustering, classification, and sampling. • Social Network: Another promising application is the analysis of social network datasets. As discussed earlier, a hyperedge is an efficient representation for the multi-way relationship in social networks [79], [80]; HGSP can then be effective in analyzing multi-way node interactions. • Nature Language Processing: Furthermore, natural language processing is an area that can benefit from HGSP.
Modeling the sentence and language by hypergraphs [81], [82], HGSP can be a tool for language classification and clustering tasks. Overall, due to its systematic and structural approach, HGSP is expected to become an important tool in handling highdimensional signal processing tasks that are traditionally addressed by DSP or GSP based methods.

VII. CONCLUSIONS
In this work, we proposed a novel tensor-based framework of Hypergraph Signal Processing (HGSP) that generalizes the traditional GSP to high-dimensional hypergraphs. Our work provided important definitions in HGSP, including hyerpgraph signals, hypergraph shifting, HGSP filters, frequency, and bandlimited signals. We presented basic HGSP concepts such as the sampling theory and filtering design. We show that hypergraph can serve as an efficient model for many complex datasets. We also illustrate multiple practical applications for HGSP in signal processing and data analysis, where we provided numerical results to validate the advantages and the practicality of the proposed HGSP framework. All the features of HGSP make it a powerful tool for IoT applications in the future.