Attribute Graph Clustering Based on Self-Supervised Spectral Embedding Network

Attribute graph clustering requires joint modeling of both graph structure and node properties, which is challenging. In recent years, graph neural networks have been utilized to mine deep information on attribute graphs through feature aggregation, learning node embeddings, and using traditional methods to obtain clustering results, exhibiting excellent clustering performance. However, these approaches often face the following issues: the original graph structure and node features contain noise, and the quality dramatically affects the clustering results; the two-step framework of first learning node embeddings and then clustering is often suboptimal as it is not target-oriented and prone to producing suboptimal results. Through research, we propose an attribute graph clustering method called FK-SENet based on a self-supervised spectral embedding network. It utilizes Laplacian smoothing filters to smooth and denoise node features. It optimizes the initial graph structure by leveraging shared neighbor information to improve the quality of the original data, thereby enhancing clustering performance. Soft labels are generated from the node embeddings themselves to achieve self-supervision, and they jointly guide the clustering process with spectral clustering loss, iteratively optimizing the clustering results. The effectiveness of this model has been demonstrated through extensive experiments and comparisons with baseline methods.


I. INTRODUCTION
Graph clustering based on attributes is a burgeoning area of research that has been widely discussed in recent times.Attribute graph clustering aims to segregate nodes linked with properties into separate and non-overlapping clusters.Typically, an attribute graph consists of edges that can link many nodes with attribute information and the relationships between these nodes.The essence of attribute graph clustering is to cluster the vertices in the graph.Compared to traditional graph clustering methods that utilize only the graph structure, attribute graph clustering is prevalent in The associate editor coordinating the review of this manuscript and approving it for publication was Md.Abdur Razzaque .real-world scenarios, particularly in situations where nodes possess rich content information, such as social network [1] and community detection [2].On the other hand, due to the valuable information that can be extracted from attribute graphs, diverse tasks in machine learning and data mining have received significant research attention, including node classification [3], link prediction [4], and more.
The attribute graph employs both graph topology and node attributes for clustering, which can be approximately classified into two categories: (1) approaches that do not use graph neural network (GNN): such as spectral clustering [5], random walk [6], matrix decomposition [7], Bayesian model [8], while some methods [9], [10] design a trade-off distance measure between graph structure and node features; (2) approaches employing graph neural networks: as GNNs have demonstrated remarkable achievements in semisupervised attribute graph learning [3], numerous attribute graph clustering techniques have begun adopting graph convolutional encoders or graph attentional encoders to acquire deep representations.Specifically, both graph autoencoder (GAE) and variational graph autoencoder (VGAE) [11] consist of two layers of graph convolution, employing a graph-structured reconstruction loss to acquire node representations.However, VGAE uses Kullback-Leibler divergence to measure the alignment between the learned representations and the Gaussian prior; The Marginal Graph Autoencoder (MGAE) [12] introduces autoencoders into the graph domain for the first time, enabling unsupervised learning of node embeddings.Adversarial Regularized Graph Encoder (ARGE) and Adversarial Regularized Variational Graph Encoder (ARVGE) [13] learn node embeddings through Graph Autoencoder (GAE) and Variational Graph Autoencoder (VGAE) respectively, and enhance the matching between node embeddings and the prior distribution using GAN; Deep Graph Infomax (DGI) [14] utilizes graph convolutions to obtain graph patch representations and maximizes local mutual information, thereby obtaining node embeddings that grasp the global clustering structure.Adversarial Mutual Information Learning (AMIL) [15] utilizes an adversarial learning strategy on the representation mechanism while using mutual information to approximate the mapping mechanism of an encoder, thereby enabling the discriminator to function effectively.DAEGC [16], an unsupervised deep attentional embedding algorithm, combines graph clustering and graph embedding learning within a unified framework.The acquired graph embedding incorporates structural and content information and is tailored for clustering.SDCN [17] incorporates a transfer operator, known as the delivery operator, to convey the learned embeddings from the autoencoder to the related GCN layer.Additionally, it utilizes a dualistic self-supervised mechanism to synchronize these separate deep neural structures and guide the overall model update.Existing GNN-based methods did not take into account the impact of clusters, prompting Graph Autoencoder based on Attention with Cluster (GAE-AC) [18] to introduce a novel approach that integrates cluster influence into the generation of graph embeddings by employing the attention mechanism to convey cluster aggregated information and incorporating the entire process within the graph autoencoder.A Deep Graph Clustering with Multi-level Subspace Fusion method (DGCSF) [19] introduces a subspace module characterized by self-representation learning into GAE.By employing selfrepresentation learning, this model can assimilate correlated nodes within the same subspace as the target nodes, thereby constructing a robust representation and devising a multi-tier self-representation learning mechanism to unveil the inherent connections among nodes.A power-attributed graph embedding and clustering (PAGEC) [20] concurrently addresses both embedding and clustering objectives.This approach employs a novel enhanced proximity matrix to capture the data relationships between node connections and attributes.It devises a fresh matrix decomposition approach for simultaneous node representation and clustering.On the other hand, the Spectral Embedding Network (SENet) optimizes the graph's topological structure by leveraging shared neighbor information but overlooks improvements in the feature matrix.While it utilizes a cluster-driven loss, it fails to consider the adverse impact of a single loss on the clustering results.
To better capture the global clustering structure for achieving desired clustering results, this paper employs feature filtering, optimized graph structure, and strict control to ensure high-quality inputs.In addressing the situation where different-hop neighbors in node clustering often play different roles, graph convolution is used to aggregate features from different-hop neighbors, effectively encoding neighborhood information at various scales and global clustering structural information into node embeddings.Spectral clustering loss and a self-supervised module are utilized for joint optimization during training to improve clustering performance.Eventually, the nodes' clustering outcomes are derived from the clustering assignment matrix.The model's overview is shown in Figure 1, and the advantages of FK-SENet over GCN are as follows, which also constitute the primary contributions of this paper.
a. Design a Laplacian smoothing filter H to eliminate noise from the high-frequency elements of the attribute matrix X.Once the smoothing process is completed, utilize the smoothed matrix as the input to the model; b.Self-supervision module: By generating soft labels from the node embeddings Z, probability distributions are assigned to each node belonging to different clusters, thereby better capturing the similarity and structural relationships between nodes.This helps to accurately characterize the clustering structure and avoid interference from local information; c.Integrate the model training with learning the clustering assignment matrix in a unified framework, enabling synchronous learning and optimization.This allows them to interact and benefit from each other mutually;

II. CORNERSTONE
The chapter is built upon prior research.Problem Statement: consider an attributed graph G = (V , E, X ), where V represents a collection of n nodes, denoted as V = (v 1 , v 2 , . . ., v n ), and E is a collection of edges.X is a feature matrix, represented as X = (x 1 , x 2 , . . ., x n ) T , where each x i denotes the real-valued 127716 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.feature vector of node v i .The graph G's topological structure is conveyed through an adjacency matrix A={a ij } ∈ R n×n , where a ij = 1 if there exists an edge between nodes v i and v j , otherwise a ij = 0. Attributed graph clustering aims to partition the nodes of graph

A. OPTIMIZING GRAPH STRUCTURE
The graph topology is an essential component of attribute graphs, and its quality directly affects the learning of node embeddings.The native graph structure ponders one-hop neighbor information, which often includes noisy edges connecting different clusters and tends to overlook potentially related edges within clusters.On the other hand, the binary adjacency matrix, representing neighbor relationships using 0s and 1s, provides a coarse representation that can have adverse effects when performing graph convolution.Therefore, it is necessary to improve the graph structure.
In essence, when two nodes significantly overlap neighbors, the likelihood of these nodes belonging to the same cluster increases.Consequently, this paper integrates shared neighbor information with the initial graph structure to enhance the underlying graph representation.Let N (v i ) denote the set that includes node v i itself and its neighbors.The ratio of common neighbors among nodes v i and v j is denoted as B ij , and the formula is as follows: Here, ∩ represents the intersection of two sets, and | • | represents the number of nodes in a set.While B ij can quantify the connections' strength between nodes, it may generate extremely low-weight edges, necessitating filtering.Expressly, for every node v i , if its similarity B ij with node v j is less than the minimum similarity between v i and its neighboring nodes, then B ij is set to 0.
Ultimately, the improved graph structure incorporates the one-hop and two-hop neighbor information, mitigating the adverse effects of the original graph structure and reinforcing the connections between nodes belonging to the same cluster.

B. SPECTRAL CLUSTERING LOSS
Spectral clustering [5] evolves from graph theory.Compared to the classical K-means algorithm, it is more suitable for handling complex data distributions, thus finding wide applications.The core idea is to treat data points as nodes in space and construct a graph by connecting them with edges.Lower edge weights are assigned for points that are far apart, while higher edge weights are assigned for points that are close to each other.By partitioning the graph into different subgraphs, spectral clustering aims to minimize inter-subgraph edge weights and maximize intra-subgraph edge weights for effective clustering.Benefiting from this, the spectral clustering loss is introduced into the attribute graph clustering model of graph neural networks [24], and the formula is as follows: Z represents node embeddings, S represents the similarity matrix, and D represents the degree matrix.The problem can be solved through eigendecomposition.Through employing spectral clustering loss to navigate the training process of node embeddings, the embeddings of various layers can explicitly or implicitly incorporate the global clustering structural information.This, in turn, proves advantageous for enhancing the clustering performance.

C. CONVOLUTION DETAILS
The model is built using three layers of graph convolution, as shown in Figure 2. The smoothed features X ∈ R n×d and the optimized graph structure Â ∈ R n×n are fed into the network.The first and second layers learn node embeddings Z (1) and Z (2) , respectively.
Z (1)  = tanh( D−1 ÂXW 1 ), Z (2)  = tanh( D−1 ÂZ (1) W 2 ), Degree matrix is denoted as D,W 1 ∈ R d×h 1 and W 2 ∈ R h 1 ×h 2 are trainable parameter matrices, and h 1 and h 2 represent the width of the hidden layers.Upon applying two layers of graph convolution, Z (1) and Z (2) encompass data from the first-order and second-order neighbors, respectively.In the final layer, the learned representations are designed to mimic the spectral clustering approach, and Z (2) is projected into a k-dimensional space, resulting in: where W 3 ∈ R h 2 ×k is a learnable weight matrix.To ensure that each column of the learned representations satisfies orthogonality, a Cholesky decomposition [25] is performed on F T F, i.e., F T F = YY T , where Y ∈ R k×k is a lower triangular matrix.In the end, we acquire orthogonal representations given by the matrix Matrix (Y −1 ) T ∈ R k×k can be regarded as a set of model parameters.The spectral clustering loss is applied to Z (3) , as shown in Equation:

D. CONSTRUCTION OF SIMILARITY MATRIX
In spectral clustering, the similarity matrix describes the degree of similarity between data points.The formulation of the similarity matrix is usually determined based on the particular application scenario, and its construction method can impact the clustering results.Learning a high-quality similarity matrix that accurately reveals node clusters is crucial.Based on the properties of the attribute graph, this paper considers both node features and the positional information of nodes in the topological structure.Since the node embeddings Z (1) and Z (2) have already captured neighborhood information within one and two hops, respectively, this paper aims to establish connections between nodes and their three-hop neighbors in order to extract more extensive information from the topological structure.Therefore, a thirdorder convolution is applied to the smoothed features.
Then, a linear kernel function is used to generate the similarity matrix.
To avoid excessive density in the similarity matrix, this paper employs the nearest neighbors method, which retains the top p at most similar points, where p=round(n/k), n symbolizes the total count of vertices, and k represents the number of clusters.The remaining entries are set to 0. Finally, the symmetry of the similarity matrix is ensured by using S = (S + S T )/2.

III. INNOVATIONS
The chapter represents a contribution from our team.

A. SMOOTHING THE FEATURE MATRIX
The core premise of graph learning is that nodes neighboring a graph are likely to be similar, leading to the expectation that node features should exhibit smoothness on the graph manifold.In this segment, the concept of smoothing is initially elucidated within the domain of graph signal processing and then introduces the Laplacian smoothing filter.
In the domain of graph signal processing [21], the eigenvalues and eigenvectors of the graph Laplacian matrix are analogous to the frequencies and Fourier bases found in classical harmonic analysis.The graph Laplacian matrix is defined as L = D − A, where D is the degree matrix.It can be broken down into the form L = ΦΛΦ −1 , where arranged in ascending order, and q≤n corresponding to the eigenvalues.It is important to note that the random walk normalized graph Laplacian L rw = D −1 L, and the eigenvalue decompositions of the symmetrically normalized graph Laplacian L sys = D − 1 2 LD − 1 2 and L are quite similar.The eigenvalues (λ q ) 1≤q≤n can be viewed as frequencies, and the eigenvectors (φ q ) 1≤q≤n can be seen as Fourier bases.
Graph signal definition: A graph signal is a mapping f from a collection of nodes to the group of real numbers V → R, which can be denoted in vector form f = (f (v 1 ), Every graph signal f can be represented as a linear mixture of basis signals (φ q ) 1≤ q≤n : ,where m q is the coefficient of φ q .The absolute value of the coefficient |m q | signifies the magnitude of the basis signal φ q in the signal f .In a topology structure, a node is considered flat if it has similar feature representations to the nodes in its neighborhood.The degree of smoothness of the basis signal φ q can be assessed by employing the Laplacian-Beltrami operator (•) [22]: where Φ q (i) denotes the i-th element of the vector Φ q and a ij represents an element of the adjacency matrix A. Equation (12)  Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
incredible smoothness.This implies that the designed filter should preserve low-frequency signals while filtering out high-frequency signals to obtain a smooth signal.Among various filters, the Laplacian smoothing filter [23] is frequently employed because of its efficient computation and compelling performance.The Laplacian smoothing filter is expressed as formula (3): where θ is a real value, H represents the filter, and stacking t Laplacian smoothing filters yields the filtered features.
In practical applications, a common technique is to employ renormalization, where we define Ã = I + A, and then use the symmetrically normalized graph Laplacian.
D and L represent the degree matrix and Laplacian matrix of Ã, respectively.Then, the filter transforms into: The parameter θ is typically set to 0.5.It is worth noting that when θ=1, H becomes the GCN filter.

B. SELF-SUPERVISED MODULE
Figure 2 shows that the model utilizes spectral clustering loss to obtain supervised feedback for learning embedding quality.It is well known that graph clustering tasks are typically unsupervised learning, where one major challenge is the lack of labeled information to guide the learning process.Additionally, relying on a single loss for guiding model optimization is insufficient, and it becomes difficult to evaluate the extent of optimization for embeddings accurately.To tackle this problem, this paper suggests a selfsupervised module.This algorithm calculates an objective function based on the embeddings and continuously iterates and updates it during optimization.Through this approach, the quality of embeddings can be effectively optimized, leading to improved clustering results.First, calculate the clustering assignment matrix Q.To transform pairwise distances in the embedding space into a probability distribution (soft labels), the Student's t-distribution is frequently used to assess the similarity between pairwise node embeddings.This allows for handling clusters of different scales [26].The matrix Q is denoted as follows: where Z i represents the node embedding, µ u represents the embedding of the cluster centers, and q iu is used to measure the similarity between embeddings.Because of the heavy-tailed nature of the Student's t-distribution, nonsimilar vertices are widely separated in the feature space.This property fosters a natural separation of clusters, effectively addressing the issue of overcrowding during the clustering process [27].
Then, calculate the target assignment matrix P. Since the clustering assignment matrix q iu ∈ Q represents the probabilities of nodes belonging to cluster centers, it becomes vital to ensure a favorable target assignment matrix P. The desired characteristics of P are as follows: (1) It can additionally accentuate nodes with higher certainty; (2) It can enhance clustering outcomes; (3) It can mitigate the adverse influence of large clusters on latent representations.The matrix P is specified as follows: Compute the target distribution P according to Equation ( 18).
In the model's training phase, P simulates the role of accurate labels and is updated with each iteration as Q is updated.
Considering the fluctuations in the self-supervised process, P is updated every five iterations in the code implementation to alleviate the adverse effects of constantly changing targets hindering the learning of node embeddings.The loss function is formulated as presented in Equation (19).
KL(•) represents the Kullback-Leibler divergence, which evaluates the unsymmetrical distance between distributions P and Q.By improving the KL divergence, this model effectively increases the distances between clusters while reducing the distances within clusters, promoting the learning of suitable embeddings for clustering.

C. TRAINING THE MODEL AND CLUSTERING
Jointly optimizing node embeddings and clustering learning is achieved in the model using spectral clustering loss and a self-supervised module.The overall loss is defined as shown in Equation (20).
η represents the balancing parameter between the two loss functions.Before training the entire model, the spectral clustering loss, as defined in Equation( 8), is utilized to obtain node embeddings Z and initial cluster centers µ.Here,Z = [Z (1) , Z (2) , Z (3) ] ∈ R n×(h 1 +h 2 +k) ,Z (1) and Z (2) respectively capture the one-hop and two-hop neighbor information, implicitly capturing the global cluster structure.At the same time, Z (3) incorporates three-hop neighbor information, explicitly capturing the global cluster structure.In the subsequent training phase, based on the joint loss, the Adam optimizer is utilized to adjust the node embeddings, cluster centers, and parameter matrices W 1 , W 2 , and W 3 .At the same time, the backpropagation process implicitly integrates the global cluster structure information into the latent node representations.The resultant output representations can effectively encompass the global clustering structure by Algorithm 1 FK-SENet Require: Feature matrix X , adjacency matrix A, cluster count k, frequency of updating target distribution T , balancing parameters Ensure: Clustering results 1: Apply Laplacian smoothing filter to the attribute matrix X using formula ( 14) to obtain X 2: Optimize the adjacency matrix A using formula (3) to obtain Â 3: Build the model using formulas (5) to (7) 4: Obtain the similarity matrix S using formula (10) 5: Minimize formula ( 8) to obtain node embeddings Z = [Z (1) , Z (2) , Z (3) ] and use Z to obtain the initial cluster centers µ 6: for i = 0 to epoch-1 do 7: Calculate Q using formula (17), Z and µ 8: if i%T == 0 then 9: Calculate P using formula (18) and Q 10: Calculate the loss L c using formula (19) 12: Minimize formula (20) using gradient descent to update the entire model 13: end for 14: Obtain the clustering results using formula (21) and Q utilizing a joint loss instead of a single, resulting in enhanced clustering results.After completing the model training, the cluster results are directly obtained using Q, and the label prediction for node v i is as follows: The argmax function returns the element's index with the maximum value in an array.Algorithm 1 provides an overview of the complete procedure for FK-SENet.

D. COMPLEXITY ANALYSIS
In the process of noise reduction of the feature matrix, according to reference [28], the time complexity is obtained as O(ndt); Optimizing the adjacency matrix, the time complexity is O(n 2 ); According to reference [17], the time complexity of the self-supervised module is: O(nk + nlogn); The time complexity of the graph convolution process is: O(nd(h 1 +n)+nh 1 (h 2 +n)+nh 2 (k +n)); The time complexity of spectral loss is O(nk 2 + n 2 ); The total time complexity is approximated as O(n 2 ).

IV. EXPERIMENTS A. DATASETS
To assess the clustering quality of the FK-SENet model on four well-established datasets, as presented in Table 1.The datasets include Cora, Citeseer, and Pubmed [11], which are widely used for graph clustering in the context of research paper citations.The Cora dataset comprises 2,708 papers on machine learning, while the Citeseer dataset includes 3,327 papers from computer science literature.The Pubmed dataset comprises 19,717 biomedical papers.In these datasets, each node denotes a research paper, and the edges signify the citation relationships between the papers.The connections are established based on these edges, which indicate their citation relationships.Additionally, each paper is associated with corresponding feature representations.The datasets categorize the papers into different groups, rendering them appropriate for research in clustering and classification; The BlogCatalog dataset [29] is a classic graph clustering dataset used for studying social networks.It comprises 5,196 blogs and their social network relationships.Every vertex signifies a blog, and the edges connecting the nodes indicate the social connections between the blogs.Each blog in this dataset is associated with a set of labels describing its topics, such as ''music,'' ''movies,'' and so on.The dataset also includes feature vectors for each blog, extracted through text mining techniques from the blog articles.The BlogCatalog dataset has found extensive use in social network analysis for research related to graph clustering and community detection.It helps explore various social relationships among blogs and their topic distributions.

B. BASELINE METHODS
In the experiment, fourteen algorithms were compared with FK-SENet.These algorithms include various graph clustering methods that solely utilize node attributes or topological structures and hybrid methods combining both aspects.Below are brief introductions to them: K-means [30]: A classic clustering method; SC [5]: The spectral clustering algorithm entails representing data in the form of a graph structure, performing spectral decomposition to obtain node embeddings Z, and subsequently utilizing k-means to derive the clustering results; Deep Walk [6]: Drawing inspiration from the concept of random walks, this approach generates node sequences by conducting random walks in the network.Subsequently, it uses the Skip-gram model to acquire node representation; DNGR [31]: This paper presents a deep graph representation model that acquires low-dimensional vector representations for every vertex by grasping and encoding graph structural details; RMSC [32]: Proposes a method based on low-rank and sparse decomposition, which decomposes multi-view data into the total of low-rank and sparse matrices, and then performs spectral clustering on the low-rank matrix.This approach effectively removes noise and conflicts, thereby improving the accuracy and robustness of clustering; TADW [33]: Utilizes rich text information and combines it with the network's topological structure to learn node representation vectors, thus enhancing the performance and quality of network representation learning.GAE&VGAE [11]: The core idea is to model and reconstruct graph data using autoencoders and variational inference techniques, thereby achieving dimensionality reduction and generation of graph data; MGAE [12]: Represents graph data as a combination of adjacency matrix and feature matrix and then uses autoencoders to transform it into low-dimensional vector representations.During the training process of the autoencoder, marginalization techniques are employed to decrease the dimensionality of the adjacency matrix and feature matrix individually, thereby reducing computational complexity and improving clustering performance; ARVGE [13]: Variational graph autoencoder that formulates graph embedding as an optimization problem of graph autoencoders and introduces adversarial regularization to enhance the quality and robustness of embeddings; DAEGC [16]: Utilizes deep learning and attention mechanisms to address the clustering problem for graph data with attributes by learning node embedding vectors and the importance of attributes; SENet [24]: Through the integration of graph embedding and neural network techniques, attribute graph data is transformed into a low-dimensional vector representation, which is further processed using k-means to derive the clustering results; MTEL [34]: A multi-task embedding learning method built an adjacency matrix of two task data and realized the aim of clustering the target node by calculating the gap for each pair of nodes in the adjacency matrix.

C. EVALUATION METRICS AND PARAMETER CONFIGURATIONS
Evaluation Metrics: To assess the clustering efficacy of FK-SENet and baseline methods, three measurement criteria [35] were employed: Accuracy (Acc), Normalized Mutual Information (NMI), and Adjusted Rand Index (ARI).These metrics offer diverse perspectives for assessing clustering performance.Acc quantifies the proportion of accurately forecasted data points; NMI evaluates the effectiveness of clustering by comparing the similarity between actual labels and forecasted labels, while ARI quantifies cluster separation and the distinctiveness of each cluster.
Parameter Configurations: FK-SENet was designed with two hidden layers, each containing 16 neurons.The learning rate was assigned as 0.03, and the model was trained for 50 epochs in each iteration using the Adam optimizer [36]; For other baseline methods, parameters were carefully chosen based on the original papers.In the case of DeepWalk, the parameters were configured as follows: the count of random walks was set to 10, the dimension of latent features was assigned as 128, and the path length was set to 80; Regarding DNGR, two hidden layers were built, comprising 512 and 256 neurons, in that order; In the case of RMSC, the regularization parameter was configured as 0.005; TADW was configured with the dimension of the decomposition matrix set to 80, and the regularization parameter was adjusted to 0.2; Both GAE and VGAE were built using an encoder with 32 neurons in the hidden layer and 16 neurons in the representation layer.The training of the encoder was conducted using the Adam optimizer with a learning rate of 0.01 for a total of 200 iterations; In the case of MGAE, the corruption level parameter ''p'' was adjusted within the range of [0.1, 1].The count of layers was set to 3, and the parameter ''λ'' was defined as 10 −5 ; Regarding both ARGE and ARVGE, encoders were formulated with 32 neurons in the hidden layer and 16 neurons in the representation layer.Additionally, discriminators were constructed with hidden layers consisting of 16 neurons and 64 neurons, in that order.ARGE and ARVGE were trained on Cora, Citeseer, and Blog datasets using the encoder and discriminator with a learning rate 0.001 for 200 iterations.During the Pubmed dataset trials, the encoder was trained using a learning rate of 0.001, while the discriminator was trained with a learning rate of 0.008, both for a total of 2,000 iterations; Regarding DGI, the model underwent training using the Adam optimizer with an initial learning rate of 0.001; In the case of DAEGC, the clustering coefficient ''γ '' was fixed at 10, and the encoder was designed with a latent layer comprising 256 neurons and an embedding layer containing 16 neurons; For SENet, the balancing parameter was defined as 1, the learning rate was set to 0.03, and the model underwent training for 50 epochs; Concerning MTEL, configure the network with 128 neurons, set the principal components to 100, establish t as 8000, select r as 3, and choose λ as 5e-2.Train the complete MTEL model for 600 epochs utilizing the Adam optimizer, employing a learning rate of 0.02, and applying weight decay at the rate of 5e-6.

E. EXPERIMENTAL RESULTS
Research findings on the four datasets are depicted in Table 3, with bold values denoting the bestperforming outcomes.X , A, and A&X denote algorithms utilizing solely node features, structure, and both.Upon observation, it was discovered that models integrating both information consistently achieved better performance than methods relying solely on one type of information for clustering.For instance, in the experiments conducted on the Cora dataset, except for the RMSC method, other approaches that harnessed both information demonstrated superior results compared to those employing only one type.It indicates that both topology structure and node features comprise valuable information advantageous for graph clustering tasks.From Table 3, it is evident that FK-SENet significantly outperforms these classical models 127722 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

F. VISUALIZATION AND ABLATION EXPERIMENTS
Visualization: To visually illustrate the learned node embeddings, this research utilized the t-SNE algorithm [37]to visualize the clustering outcomes of the Cora in a twodimensional space, as depicted in Figure 6.Each subplot corresponds to a different setting in the ablation experiments.From the visualizations, it can be observed that FK-SENet effectively clusters nodes within the same cluster.Furthermore, as the model gradually improves, the overlapping regions decrease, and nodes belonging to the same category cluster together, yielding significantly better clustering results than other variants.Ablation Experiments: Ablation experiments were conducted to analyze further the importance of the Laplacian smoothing filter and self-supervised module, where the model is degraded to SENet by excluding these two modules.Table 4 provides detailed experimental steps, where ''×'' FK-SENet without the corresponding sub-modules, and ''✓'' indicates FK-SENet with the corresponding submodules.The table shows that the proposed model surpasses the other three variants significantly on these four datasets, highlighting the crucial role of the filter and self-supervised modules in enhancing clustering performance.

V. CONCLUSION
This paper introduces a self-supervised attribute graph clustering model based on the spectral embedding network.We incorporated the Laplacian smoothing filter, a concept from graph signal processing, to effectively remove high-frequency noise and improve the graph's topological structure by sharing neighbors.Additionally, we incorporated a self-supervised module, leveraging the current node embeddings for self-supervised learning of their representations.Finally, we conducted joint optimization guided by both self-supervised and spectral losses, effectively capturing the global clustering structure.Through conducting comprehensive trials on three citation networks and one social network, the results were compared with various classical graph clustering algorithms, confirming the attribute graph clustering performance of FK-SENet.Simultaneously, this strongly affirms the academic significance of this study, as it partially fills research gaps in the current literature by denoising and optimizing input data and introducing self-supervised learning.Our approach is not merely an attempt to improve existing methods but rather an academic exploration and innovation, offering new perspectives for future research in attribute graph clustering.Our research holds vast potential for applications, extending beyond citation networks and social networks into other domains such as text clustering, image clustering, and link prediction.These are also directions our team will diligently pursue in the future.We look forward to applying this methodology to real-world challenges, offering practical value in diverse application domains.

FIGURE 2 .
FIGURE 2. Model Graph of Self supervised Spectral Embedding Network.

FIGURE 4 .
FIGURE 4. Fixing γ at the optimal value and η at the initial value, study the Laplacian smoothing process for the parameter θ .

FIGURE 5 .
FIGURE 5. Incidentally, keeping γ and θ fixed at their optimal values, investigate the parameter η in the total loss function.

FIGURE 6 .
FIGURE 6. Visualization of Clustering Results of FK-SENet Various Variant Models illustrating with the Cora dataset as an example.

TABLE 2 .
The selection of hyperparameters.

TABLE 3 .
Effectiveness analysis of diverse clustering algorithms on four benchmark datasets.

TABLE 4 .
Ablation experiment.interms of ACC, NMI, and ARI due to its more effective exploitation of both types of information.Let us take DAEGC and SENet as examples to provide some brief explanations.SENet not only smooths the node features but also enhances the quality of the network topology.It utilizes a joint optimization model with both spectral clustering loss and self-supervised loss, resulting in outstanding clustering performance across various aspects.