DVAEGMM: Dual Variational Autoencoder With Gaussian Mixture Model for Anomaly Detection on Attributed Networks

A significant aspect of today’s digital information is attributed networks, which combine multiple node attributes with the basic network topology to extract knowledge. Anomaly Detection on attributed networks has recently drawn significant attention from researchers and is widely used in several high-impact areas. Most current approaches focus on shallow learning methods such as community analysis, ego network or selection of subspace method. These approaches have network sparsity and data nonlinearity problems, and they do not even capture the intricate relationships between various information sources. Deep learning approaches like graph autoencoders are utilized to perform anomaly detection through obtaining node embeddings while dealing with the network nonlinearity and sparsity issues. However, they suffer from the problem of ignoring the latent codes’ embedding distribution, which results in poor representation in many instances. In this paper, we propose a new framework called DVAEGMM to detect anomalies on attributed networks. First, our framework utilizes a dual variational autoencoder for capturing the complex cross-modality relationships between node attributes and network structure, like vanilla autoencoders, but it also considers the potential data distribution and makes use of a generative adversarial network (GAN) for an adversarial regularization approach. An adversarial mechanism makes the encoder make more accurate estimates of how potential features might be distributed. As a result, decoders can make graphs that are more like the original graph. Each input data point is represented by a low-dimensional representation and a probability of reconstruction by the algorithm. Lastly, the Gaussian Mixture Model, a distinct estimation network, is used to approximate the latent vector density, resulting in the detection of anomalies from measuring sample energy. They are trained jointly as an end-to-end framework. DVAEGMM helps in the simultaneous optimization of the mixture model, generative adversarial network, and variational autoencoder parameters. The joint optimization balances the reconstruction probability, the latent representation density approximation, and regularization. Extensive experiments on attributed networks prove that DVAEGMM significantly beats the existing methods, proving the efficiency of the presented approach. The AUC scores of our proposed framework for the BlogCatalog, Flickr, Enron, and Amazon datasets are 0.89380, 0.87130, 0.72480, and 0.75102, respectively.


I. INTRODUCTION
Today, social networks are becoming an integral part of people's lives, allowing them to communicate and interact on a global level with those who share common values, views, and perspectives. People use social networking websites such as Twitter, Facebook, Myspace, Flickr, etc. to build professional and personal networks, gather valuable information, and exchange factual personal information with those around them [1], [2]. Anomalies on social networking sites pertain to odd and frequently illegal user behavior. All mainstream methods for anomaly detection assume that the samples are distributed uniformly and independently. However, in several real situations, cases are frequently linked to one another, forming a complicated network [3]. In the past few years, the topic of attributed anomaly detection in complicated networks has grown in popularity as a research topic. Compared to normal networks, which only use topology information to identify anomalies, attributed networks encode a wide variety of attribute characteristics for each node.
In the current world, attributed networks seem to be everywhere. Instead of observing interactions among nodes only, attributed networks contain a large set of characteristics or features for each node [4], [5]. Attributed networks are widely utilized to describe many complex systems because of the affinity between nodal properties and network architecture. There has been a significant increase in ongoing research to detect anomalies in attributed networks, which is a very critical problem due to its profound impacts in a wide range of real-world applications, including cyber-attack tracking in communications systems, social media spam detection, and fraud prevention, to name a few [6], [7], [8]. Attributed network anomaly detection is incredibly hard due to the consideration of attributes and structure both. Several methods for detecting anomalies in attributed networks have been presented recently. Many methods attempt to find abnormalities unsupervised since obtaining ground-truth anomalies is outrageously costly [9].
Some of them use only community-level structural information to conduct anomaly detection or by monitoring the adequacy of linked subgraphs [10], [11], [12]. A few of these investigate how to find feature-level anomalies in a subspace by selecting node features [13], [14]. Graph autoencoder based methods [4], [15], [16] and residual analysis-based methods [17], [18], use network reconstruction or residual assessment to detect node irregularities because they assume that anomalies will not be estimated by other reference nodes. Despite the fact that these innovative methods have had a lot of success, they still have some flaws. In highdimensional, complex datasets, some of them rely on shallow practices that can't keep up with the numerous interactions between structure and attributes. A complicated issue in anomaly detection is combining the network's topology with nodes attributes. Established approaches to detect anomalies have relied heavily on structure-based (or communitybased) methodologies [19], [20], [21]. As a result, it cannot be utilized for attributed network anomaly detection. Aside from this, the attribute-based model implies that highly complicated anomalies are present in the subgroup depending on user attributes. However, the traditional attribute-based approaches take into account the structure and attributes of the network only [22], [23], which leads to a lower detection rate. Furthermore, the description of anomalies varies across fields, indicating that there is no generally agreed anomaly definition. Furthermore, the description of anomalies varies across fields, indicating that there is no universally accepted definition of an anomaly [24], [25]. So, it's important to deal with the following challenges: 1) Data nonlinearity and network sparsity. The links and nodes' characteristics are extremely nonlinear, and the network's topology is extremely sparse in the current world [26].
2) Unlabeled anomalies in the datasets. The ineffectiveness of detecting anomalies through classification is exacerbated by the misclassification of abnormal and normal data. So, the methods to detect abnormalities are needed to find anomalies in attributed networks in an unsupervised way that is quick and easy [27].
3) Homophily-based network smoothing. It is possible to detect network anomalies by smoothing networks based on the homophily assumption. Unfortunately, these methods aren't very good at detecting anomalies because the results could be too smooth, making it hard to tell the difference between the majority of normal nodes and the abnormal ones.
4) The deterministic nature of autoencoders. Even though the autoencoder represents the latent variables as deterministic mappings, it is insufficient to deal with variation. 5) Heterogeneous input and problems in setting an appropriate and precise reconstruction error threshold. When the input variables are heterogeneous, it is challenging to compute anomaly scores using autoencoder-based anomaly detection. It is necessary to use a weighted sum. The problem is that there is no universally objective approach for determining the proper weight because the weights will differ based upon the data. Furthermore, once the weights have been determined, setting the reconstruction error threshold is time-consuming.
To address these challenges, we present the Dual Variational Autoencoder with Gaussian Mixture, DVAEGMM, a new framework to detect anomalies on attributed networks. The primary objective of our framework is to enforce the learnt latent embedding to match a prior distribution while simultaneously minimizing the reconstruction errors of the topological structure and node attributes. The following are the key aspects of this paper: • Using a dual variational autoencoder to capture network sparsity and nonlinearity, DVAEGMM solves two problems at once: it captures cross-modality interactions between topological structure and node features, and it solves the problem of unlabeled anomalies.
• Our approach accomplishes joint learning on node features and network structure while adhering to anomaly detection requirements and eliminating homophily and over-smoothing issues.
• A Dual Variational Autoencoder based embedding framework is proposed that is based on probabilities instead of reconstruction errors, and the probabilities seem to be more systematic and objective than VOLUME 10, 2022 reconstruction errors and therefore do not need modeldependent thresholds. As a stochastic generative model, VAE is also able to provide calibrated probabilities for dealing with the variability that is found in autoencoder based models.
• We include an adversarial component in the dual variational graph autoencoder to ensure that encoded data is distributed uniformly. This component would identify if the data comes from a low-dimensional representation of the graph network or from the genuine distribution of samples. Using a discriminator, the encoder can learn a better representation of the graph by creating lowdimensional variables with distributions that are closer to the distribution.
• We leveraged the Gaussian Mixture Model (GMM) across the learned low-dimensional space to tackle the density analysis problem for inputs having complicated structures. Our model combines the power of dimensionality reduction with density analysis. End-to-end optimization of both the deep autoencoder and the mixture model parameters has been achieved.
• In a unified framework, the dual variational autoencoder learning, adversarial regularization learning, and gaussian mixture models are jointly optimized such that each can complement the other and ultimately result in better anomaly detection.
The rest of this work is structured in the following manner. An analysis of the relevant literature on attributed network anomaly detection is provided in Section 2. The problem of anomaly detection on attributed networks is clearly stated in Section 3. Section 4 describes the preliminaries. Section 5 presents the proposed DVAEGMM anomaly detection framework in detail. Section 6 presents empirical proof of DVAEGMM's effectiveness for detecting anomalies in realworld networks using several assessment measures. Finally, in Section 7, we come to a logical conclusion.

II. RELATED WORK
Traditional anomaly detection and attributed anomaly detection are discussed in relation to each other in this section.

A. TRADITIONAL ANOMALY DETECTION
It has recently been found that most of the classic methods of anomaly detection use unsupervised approaches to discover anomalies in cases where there is just a limited number of labelled anomalous data and plenty of unlabeled data [28]. Conventional approaches to identifying anomalies are generally divided into clustering-based, reconstructionbased, and one-class classification-based methods. Data density is estimated using methods based on clustering [29], [30], [31]. Normal data is clustered to discover anomalies using a two-step process, starting with dimensionality reduction. Approaches based on reconstruction assume that anomalies cannot be adequately recreated from the latent representations, such as PCA-based algorithms [32], [33] and autoencoder-based methods [4], [15], [34], [35] that employ anomaly scores to detect them. For anomaly detection, oneclass classification-based approaches [36], [37], [38], [39] differ from the previously described two categories in that they try to identify the line between normal and abnormal samples. In spite of their effectiveness in the typical anomaly detection area, these algorithms fail to scale effectively to graph data, because topological correlations between sample points are crucial. As a result, the detection of anomalies in graph data remains an open-ended issue.

B. ADVERSARIAL MODELS
Our method's adversarial approach relies on GAN [40], in which a generator and a discriminator compete in a minimax game to optimize each other. It was GraphGAN [41] that used the adversarial approach for graph learning for the first time. By imposing the distribution of the real data as a prior distribution on existing network embedding algorithms, ANE [42] views embedding vectors as the generated result and employs GAN as an additional regularization term. By incorporating the adversarial process into the autoencoder, Makhzani et al. presented an adversarial autoencoder to learn the latent embedding [43]. But this approach is intended for basic data, not graph data. Many adversarial models have been successful in computer vision, but the graph-structured data cannot be handled by them directly.

C. ANOMALY DETECTION ON ATTRIBUTEDNETWORKS
Auxiliary attribute data is common in real-world networks, hence attempts to identify anomalies in attributed networks have increased in recent years. For anomaly identification on attributed networks, four main categories may very well be outlined: community assessment, subspace identification, residual analysis, and deep learning techniques [44]. Community or ego-network anomaly detection approaches fall into the category of community analysis-based anomaly detection. CODA [10], for example, uses a cohesive predictive model to simultaneously detect communities and identify community abnormalities. AMEN [11] analyses each node's ego-network information and finds abnormal areas in attributed networks. In addition, there is a family of approaches that aim to identify anomalous nodes in a subspace of node characteristics [45], [46]. For example, GOutRank [45] uses subspace cluster analysis to identify anomalies in attributed networks. Prior to anomaly detection, ConSub [46] employs the selection method to choose subspaces. Residual analysis has also been a popular method for determining the abnormality of nodes in attributed networks, in addition to those already discussed. An anomaly is detected by RADAR [17] when residual feature data and its compatibility with the network show behavior that differs significantly from the normal. ANOMALOUS [18] is a combined anomaly detection system that uses matrices CUR reduction with residual extrapolation to maximize attribute selection and anomaly detection. Besides development, these approaches are constrained by inherent shallow modes and therefore unable to handle crucial attributed network challenges like network sparseness, data nonlinearity, and complicated modality connections among different data sources.
A lot of work has gone into building deep neural networks to detect anomalies on an attributed network due to the growing interest in deep learning research. Network embedding methods that assign nodes in a network to lowdimensional representations, are also getting a lot of attention since low dimensional representations can efficiently retain the topological structure [47], [48], [49]. Anomaly aware embedding on attributed networks is now the subject of several studies that take into account both network embedding and deep learning [4], [15], [16], [19], [38], [50], [51], [52], [53], [54]. Complex connections between a network's topology and node features are observed in Anoma-lyDAE [4], and both structural and attribute data are used to assess anomalies. A unique graph convolution encoder and decoder in SpecAE [16] to learn each node's local representations. The energy of each node's latent representation in the Gaussian mixture model determines its suspiciousness rating. DeepAD [19], an innovative hybrid embedding method, takes advantage of the strong non-linearity in both attributes and network structure to detect anomalies using reconstruction errors. A DUAL-SVDAE [38] is composed of a structure autoencoder and an attribute autoencoder to learn the embedding representation of the node, followed by a dual-hypersphere training algorithm for learning two normal node hyperspheres. Using GCN, the input network is reduced into low-dimensional embedding representations by DOMINANT [15], which it then uses to reconstitute the topological structure and nodal characteristics. Rather than reconstruction error, ResGCN [50] uses residual information from the input network to rank anomalies. GCN captures network sparsity and nonlinearity. Deep neural networks capture residual information, and residual-based attention lowers the impact of anomalous nodes.

III. NOTATIONS AND PROBLEM STATEMENT
In this section, we describe the common notations and concepts used in this paper. Table 1. summarizes the most significant notations.
Definition: An attributed network G = (V, ε, X) contains: where the r th row of X(r = 1,2,. . . M) represents the information for the attributes of the r th node in M dimension size. The Graph links are illustrated by an adjacency matrix A ∈ R N xN , that stores only binary values (i.e., 0 or 1) where A ij = 1 denoting a link between the node i and node j. Attributes latent embedding of nodes is represented as Z A ∈ R N xL and Z S ∈ R N xL respectively, where the embedding space dimension size is L.
Problem Statement: For a given attributed network G with X and A as the node attributed matrix and adjacency matrix respectively, anomaly detection for an attributed network is to find and rank all the rare nodes according to how they differ markedly from most of the other reference nodes from the perspective of both the attribute information and topological structure.

IV. PRELIMINARIES A. GRAPH CONVOLUTIONAL NETWORKS (GCN)
GCNs are convolutional neural networks that intend to perform directly on graphs. The GCN, in particular, illustrates the topology and the interconnections among features and nodes through the node adjacent matrix A and the feature matrix X. It uses spectral convolution to apply the convolutional operation on graph data to generate the transformation: where Z (l) and Z (l+1) are the convolutional input and output respectively in layer l. W (l) is the layer-specific trainable weight matrix. The spectral convolution function is used to express each layer as follows: whereÂ = A + I, D and I are diagonal degree and an identity matrix, respectively. ∅ is an activation function and, based on previous research, we chose ReLU (·) = max(0, ·) as the activation function [55]. Z (0) is set as X ∈ R N xM for the first layer. Therefore,

B. AUTOENCODERS (AE)
Autoencoder based Anomaly detection approaches have recently gained a lot of attention due to their ability to extract extremely non-linear connections. An autoencoder is a type of deep neural network that uses unsupervised learning to learn low-dimensional embedding representations of data. It has demonstrated convincing learning results in different areas. An encoder and a decoder are the main components of an autoencoder. The node embeddings are obtained by the encoder using the attribute data and network structure as input. The decoder then uses these node embeddings as input to reconstruct the attribute data and also the network structure. Anomalies are then termed as inconsistencies between input and the reconstructed network [56], [57]. Generally, two parts make up the neural network. One is the encoder function enc w (.), and the other is the decoder function dec u (.). It tries to learn a code from the input by going through a pair of encoding and decoding processes.
where X is the input data andX is the reconstructed input. The main concept is to find enc w (.) and dec u (.) so that the difference between X andX is as small as possible [58].
Approaches that use autoencoders to extract highly non-linear connections for anomaly detection have recently attracted a lot of attention. In general, AE encoders provide discrete outcomes, and train a function to explicitly map the input results in coping with high-dimensional data is challenging. However, generating an embedding by integrating node semantics data and network topology is difficult, because the combined data has a significantly larger dimension than network topology alone [59]. To overcome this limitation, the Variational Auto-Encoder (VAE) was developed by incorporating a priori constraints into the embedding learning process. Rather than learning the discrete latent variables explicitly as seen in AE, the VAE encoder implies a posterior distribution of continuous latent variables based on a given input. So, it is preferable to handle complex and highdimensional data, such as social networks [49], [60], [70]. The variational autoencoder (VAE) and its variants have seen a huge success, particularly in the creation of realistic data. Its structure is quite similar to that of a standard autoencoder. VAE models work as generative models, too, because they can produce new data from existing data. Initially, VAEs were intended for image analysis methods like denoising [61], but research on these models has expanded to various other areas, including anomaly detection. VAEs use in anomaly detection problems is anticipated since the primary concept of this approach is associated with a lowerdimensional representation, which has already been used in many anomaly detection methods [15], [16].
When utilizing variational autoencoders, the main advantage is that they are probabilistic. By combining the generative model P(X|z) with an inference model Q (z | X), the learning representation issue can be solved as a variational inference problem and comprehending latent representation of the data [62]. Each input data point is postulated to have a Gaussian distribution. It is possible to encode a Gaussian multivariate latent variable or hidden variable from the input x using the encoder q θ (z | x). The decoder p ∅ (x | z) takes samples for each data input and reconstructs the input x in x . The basic concept is to determine the likelihood that x' was obtained through z. Variational lower bound optimization is done as follows in the variational graph encoder: In the equation, KL stands for the Kullback-Leibler difference. KL[q [z|x] ||p (z)] is the regularization term.

D. GENERATIVE ADVERSARIAL NETWORKS (GAN)
A generative adversarial network (GAN) is a commonly employed deep generative model. The fundamental principle of GAN is to train a generator G and a discriminator D so that the generator learns to confuse the discriminator and the discriminator learns to differentiate between real and fake samples. Training improves both the discriminator's ability to discriminate between real and fake data and the generator's ability to produce realistic data, eventually to the point where the discriminator is no longer able to do so. This happens as a result of the generator's improved ability to produce data that resemble the genuine data seen in the training dataset. As a result, a GAN has been effectively trained and can now produce data that resembles those in the training set. The following minmax game is the objective function of GAN.
In this model, we suppose that there exist a definite number of Gaussian distributions, each of which corresponds to a cluster. Consequently, a Gaussian Mixture Model is used to group together data points from a given distribution. In fact, these are probability models that spread out data points into different clusters using a soft clustering strategy. The main parameters that make up a Gaussian function are its mean, covariance, and mixing probability. The mean µ, and covariance , are used to represent the center and width of the component, respectively, while the mixing probability π specifies the size of the Gaussian function. In general, the Gaussian density function can be expressed in the form of: where x specifies the data points and D represents the number of dimensions of each data point.

V. PROPOSED FRAMEWORK
The DVAEGMM anomaly detection framework for attributed networks is described in this section, which combines a dual variational autoencoder with a Gaussian mixture model. Figure 1 is an illustration of the DVAEGMM pipeline. This framework is divided into four significant components: a structure reconstruction variational autoencoder, an attribute reconstruction variational autoencoder, adversarial model, and a Gaussian Mixture model.

A. STRUCTURE RECONSTRUCTION MODEL
To obtain a significant number of prominent high-level node features, the structure variational autoencoder first converts the apparent node attribute X into a low-dimensional latent representation Z S . The structure variational autoencoder uses a GCN encoder to learn the nodes' embedding and, subsequently, a GCN decoder is used to reconstruct the structure. For the encoding process, we use two-layer GCN to produce the parameters µ and σ : where µ and log σ are the matrices corresponding to µ n and σ n respectively. µ n and σ n are the mean and standard deviation vector of node v j s embedding z j . Following that, sampling is used to determine the latent variables N (µ, σ ). As a result, the inference model is: here, µ = Z (2) represents the mean vector z i matrix. When reconstructing the network's structure, two graph convolutional layers are utilized. Embedding output from the encoder will be fed to the decoder as input, and the GCN decoder is defined as: where Z S is the encoder's learned embedding while Z D and A are the decoder outputs for the first and second layer, respectively. The number of dimensions in D is equal to the number of nodes.

B. ATTRIBUTE RECONSTRUCTION MODEL
Normal nodes' latent embeddings are learned by only using the attribute matrix as input. Two non-linear feature transform layers are employed in the encoder of the attribute variational autoencoder to learn a non-linear feature mapping of the node attributes, rather than relying on the structure information, as is the case with the structure autoencoder. In this example, the observed attribute data is mapped to the latent embedding Z A , using two non-linear feature transform layers.
A are trainable weights and b (1) , b (2) are the biases in two layers.
Finally, node structure embeddings Z S and node attribute embeddings Z A are fed as input to a simply inner product decoder, which reconstructs theX as follows: Following the attribute encoder, a feature fusion module is also built to fuse the learnt node embeddings Z S from structure space and Z A from attribute space into a fused embedding Z F , which is accepted as input by the GMM to capture the relationship between structure and attribute. Here's how the fusion procedure works: The element-wise plus operator of two matrices, which adds the corresponding elements in the same position of the two matrices, is represented by the • operator.

C. ADVERSARIAL MODEL
The core concept of our approach is to use an adversarial training model to induce latent representation Z F to match a prior distribution. There are two major components to our adversarial model: The dual variational autoencoders serve as the generator of the adversarial network. The generator tries to fool the discriminator by producing fake data (latent variables generated by the input data from the dual variational autoencoders). The discriminator's goal is to determine if the samples come from real data or are artificially generated. Data from the prior distribution p z output is considered positive by the discriminator, while data from the latent variable z output is considered negative, and its cost function is defined as follows: The estimate network uses GMM to do density estimation based on low-dimensional representations of input data. In the training phase, GMM parameters are calculated using an unspecified distribution ϕ, mean µ, and covariance of mixture components. The estimation network evaluates the likelihood/energy for samples without alternate techniques like EM. The estimate network estimates mixture membership for each instance using a multi-layered model. An estimation network predicts mixture component membership using low-dimensional representations z and the number of mixture components N as follows: where O is the network output parameterized by α m , andβ represents a P-dimensional vector to predict the membership of a soft mixture component. GMM model parameters can be further estimated using sample set of N size with their membership prediction, ∀1 ≤ p≤P.
whereβ i is the membership prediction for z i andφ p ,μ p andˆ p are mixture probability, mean and covariance for component p, respectively. Sample energy can be calculated using the estimated parameters by where |. | is matrix determinant.

E. OBJECTIVE FUNCTION
The DVAEGMM objective function O(W) is generated as follows for a dataset of N samples: This objective function includes the following components: • The first one is the loss function, which describes the dual variational autoencoder reconstruction error from both structure and attribute perspectives and θ is the parameter that regulates the balance between structure and attribute reconstruction.
• Second and third are KL divergence for structure and attribute variational autoencoders, respectively.
• The Forth component is used to jointly train the encoders of both variational autoencoders, and the discriminator via a minimax game such that they optimize each other.
• The fifth component EN (Z i ), represents the GMM estimation's sample energy of the latent representation Z i , and models the probability that we could observe with Update the parameters ϕ, µ, with backpropagation; 13. end for 14. As the normality score, estimate the sample energy of all n samples via Eq. (25); 15. Return the list of nodes L, sorted by normalcy score in decreasing order; the input samples. We increase the likelihood of nonanomalous samples by reducing sample energy, and we identify samples with top-K high energy as anomalous.
• Sixth is covariance penalization, which penalizes the small values in the diagonal elements of the covariance matrix to solve the singularity problem in GMM.
Our proposed approach may be used to detect abnormalities in attributed networks after optimization of the objective function. The estimation energy in Eq. (25). is then used to evaluate the anomalous level of each node in our testing data. Nodes with higher rankings are more likely to be rated as anomalies. Our proposed approach is described in Algorithm 1.

VI. EXPERIMENTS
The performance of the DVAEGMM on various datasets is discussed in this part. Two of the most important evaluation tasks are anomaly detection performance analysis and model parameter sensitivity analysis. The four datasets are initially described in detail in this section. After that, the DVAEGMM is compared to the other baseline techniques, and the anomaly detection accuracy is given, as well as a comparison of the experimental data and analysis. Finally, we examine the experimental parameters' sensitivity.

A. DATASETS
In this paper, we perform experimentation on the following real-world attributed datasets: data with and without groundtruth anomalous labeling, in order to test the performance of our suggested approach. All networks have been extensively utilized in earlier research. Table 2. summarizes the detailed statistics of each dataset.

1) DATASETS WITHOUT GROUND-TRUTH ANOMALOUS LABELS
A BlogCatalog [15] is a website where bloggers can follow one another to create a social network. The blogger's features have been used to define the user and the blog, and the node attributes are composed of attribute information.
Flickr, like Instagram [63], is a photo-sharing website. People form social platform similar to BlogCatalog by connecting with each other. Tags, which reflect a user's interests, define their node attributes.

2) DATASET WITH GROUND-TRUTH ANOMALOUS LABELS
Enron [64] is an electronic mail communication system where edges denote the transfer of e-mails among individuals. Every node has 20 attributes that specify email content, such as the average content length and the number of people who receive mail. Spammers are considered anomalies, and this dataset is already widely used to detect anomalies.
Amazon is a copurchase network [45]. Each node has 28 attributes that describe various aspects of online commodities, such as price and rating. The term ''anomalous nodes'' refers to nodes that have the label ''amazonfail.' ' We explicitly use the given labels to evaluate our approach for the networks that have ground truth anomaly labels. For unlabeled datasets, we must manually infuse anomalies into attributed networks for the evaluation task. To ensure a proper a common aberrant substructure in many specific circumstances in which a limited selection of nodes is significantly more clearly tied to each other than normal. As a result, after specifying the clique size as c, a total of r nodes is randomly picked from the network and connected all together, and then all the r nodes forming the clique are considered anomalies. This procedure is repeated indefinitely until the total number of c cliques has been generated. So, rxc are the total number of structural anomalies. An attribute perturbation method proposed by [66] is then used to find abnormalities viewed from the standpoint of an attribute. To verify that the attributed network contains an equivalent number of anomalies from structural and attribute perspectives, rxc nodes are chosen at random as attribute disruption targets. Then, additional t nodes are picked randomly from the network for each designated node n i and the Euclidean distance between n i and all the t nodes is computed. The node with the maximum distance is then elected as n j , and the attribute X j of the node VOLUME 10, 2022 n j is changed to X i of node n i . Node n j is regarded as the attribute anomaly. In our experiments, we also set r = 15 and c to 10 and 15 for BlogCatalog and Flickr, respectively, which are the same as [15] and [50].

1) ROC-AUC
The ROC curve plots the true positive rate (an anomaly is identified as an anomaly) versus the false positive rate (normal is identified as anomalous) based on the ground truth and outcomes of detection. The AUC value represents the likelihood that a randomly picked anomalous node would be scored higher than a normal node. The approach is of high quality if the AUC value is close to one.

2) PRECISION@K
In order to quantify the percentage of true anomalies discovered by a specific detection scheme in its highest K ranked nodes, we use Precision@K, which ranks the nodes according to their anomalous scores. Precision@K = |TRAnobyMethod| ∩ |RankAno| |RankAno| (27) where TRAnobyMethod denotes the true anomaly detected, while RankAno denotes anomalies in the Top-K ranking node.

3) RECALL@K
This evaluation indicator measures the percentage of true anomalies explored by a particular detection approach out of the total ground truth anomalies.

C. BASELINES
DVAEGMM is compared to the following techniques to demonstrate its ability to detect anomalies: • LOF [23] defines how separated an object is in relation to its environment and locates anomalies on a contextual level. This method only considers nodal attributes.
• RADAR [17] is an unsupervised approach for finding anomalies in attributed networks. The residuals of attribute values and their similarity to network data are used to characterize anomalies whose behavior is very different from the majority's [70], [71]. This helps to identify anomalous behavior.
• DOMINANT [15] is a cutting-edge unsupervised approach based on deep learning to detect anomalies.
Reconstructing the adjacency as well as the attribute matrix jointly is accomplished using a graph convolution autoencoder. It quantifies the weighted sum of reconstruction error terms to assess the irregularity of each node.
• DUAL-SVDAE [38] is composed of a structure autoencoder and an attribute autoencoder, which acquire the node's embedding in structure as well as in feature space, respectively. Then, from the structure and attribute viewpoints, a dual-hypersphere learning is imposed to learn two hyperspheres of normal nodes.
• ResGCN [50] In place of reconstruction errors, the residual information used to rank anomalies is generated from the input network. ResGCN uses GCN to capture network sparsity and nonlinearity, a deep neural network to collect residual information, and a residualbased attention mechanism to limit the negative impact of anomalous nodes.

D. EXPERIMENTAL DESIGN
In the experiment, we implemented DVAEGMM on Python language, and trained it with 100 training epochs for all the datasets. For optimization, the Adam algorithm with a learning rate of 0.002 is being used. Since, TensorFlow, Pytorch and others recommend a learning rate equal to 0.001, but we found the best result at 0.002. The embedding dimension has been fixed at 64 for all the datasets. Moreover, in all the DAGMM instances, we set parameters (θ, γ 1 , γ 2 ) as (0.6, 0.1, 0.005) respectively, where θ is used to control the tradeoff between structure and attribute reconstruction, and γ 1 and γ 2 are meta parameters. We use the publicly accessible implementations from the source publications for the baseline techniques, and we fix the hyper-parameters to the recommended  values in the papers that presented the methods. Table 3. summarizes the values of different parameters.

E. EXPERIMENTAL RESULTS
A number of benchmarks are compared to the DVAEGMM's performance with respect to ROC-AUC. For each dataset, the AUC results of the methods are given in Table 4. In addition, the ROC curves of all the methods for the BlogCatalog, Flickr, Enron, and Amazon datasets are demonstrated in Figure 2, Figure 4, Figure 5, and Figure 6, respectively. The ROC curve demonstrates that our proposed framework outscored the other baseline anomaly detection methods. Compared to the second-best model, ResGCN, DVAEGMM increases AUC by at least 3.182%, and to the worst model, LOF, it increases AUC by 40.23% on the BlogCatalog dataset. Figure. 3 demonstrates the AUC performance comparison. The AUC results and the interpretation of ROC curves are explained as follows: • A comparison of all datasets shows that the DVAEGMM outperforms all other baseline techniques. DVAEGMM's ability to combine the power of variational autoencoder and GMM for anomaly detection has been proven.
• In all three datasets, the AUC values of LOF are lower than the other four approaches that consider structural and attribute information to detect anomalies because LOF only evaluates attribute data.
• The residual analysis-based method, RADAR, outperforms the traditional method, LOF, However, because of their shallow mechanisms for dealing with network sparseness, data nonlinearity, and complicated modality connections, these approaches are still constrained.
• Dominant combines structural and attribute information for node embedding, but autoencoder based methods that use reconstruction errors cannot adequately measure the abnormality.
• Dual-SVDAE outperforms LOF, RADAR, and Dominant due to the use of a dual hypersphere learning mechanism.
• ResGCN outperforms all the other baseline methods except our DVAEGMM due to the attention based deep residual modeling approach.  • The performance of our model is slightly worse in the datasets Enron and Amazon than in the other datasets. We speculate that this is most likely caused by the low dimensionality of the dataset. As a result, we are certain that our model can detect more true anomalies in a ranking list with a limited length. Tables  5 and 6 provide the experimental findings for Precision@K and Recall@K, respectively. Figure. 7 and Figure. 8 demonstrate the Precision@100 and Recal@100 performance comparisons, respectively. Compared to the second-best model, ResGCN, DVAEGMM increases precision@100 by at least 6.7%, and to the worst model, LOF, it increases preci-sion@100 by 70.7% on the BlogCatalog dataset. Similarly, it increases the recall@100 by 1.6% as compared to the ResGCN and 24.2% as compared to the LOF, on the BlogCatalog dataset. From this evaluation data, we derive the following conclusive results: • With the exception of Precision@200 on BlogCatalog and Recall@200 on Flickr, the suggested DVAEGMM framework surpasses existing baseline approaches on all three attributed networks. It indicates the efficacy of our approach.
• Due to DVAEGMM's superiority in Precision@K and Recall@K compared to other methods, we believe our model can achieve higher accuracy and locate more real anomalies within a ranking list with a limited length.

F. PARAMETER SENSITIVITY
An anomaly detected is examined in this section according to the parameter sensitivity of different embedding  dimensions D and the balance parameter θ. The studies were performed on the BlogCatalog dataset. Figure 9 shows the trend of AUC under different dimensions of the embedding layer. We can observe that a higher dimensional embedding, such as 64 or 128 dimensions, provides good performance since higher dimensional embeddings can encode additional data. However, due to poor modelling capacity or overfitting, the dimension with too low or too high a value would degrade the performance. For anomaly detection, it is clear that the interactions between the network structure VOLUME 10, 2022   and node attributes on the attributed network are critical, as only considering attribute reconstruction (θ = 0) or structure reconstruction (θ = 1) would result in low efficiency. Figure 10 shows the trend of AUC under different values of θ, indicating that a suitable balance factor can effectively improve performance.

G. ABLATIONS STUDY
In this section, using DVAEGMM for anomaly detection, we explore the effects of node attributes, network structure, adversarial training, and estimation density. Specific to DVAEGMM, each module's contribution is examined separately. The following are the specifics of the ablation settings:    • Without-GMM(WOGMM): To supervise model training, we drop the GMM module from DVAEGMM and substitute it with two reconstruction losses, one for network structure and the other for node attributes.
Ultimately, reconstruction error is used as an anomaly score to detect anomalies.
• Without-StructuralVAE (WOSVAE): For training purposes, only attributed VAE is used, as the structural VAE module is eliminated, and finally detects anomalies by approximating the density using the GMM module.
• Without-AttributedVAE (WOAVAE): For training purpose, only structural VAE is used, as attribute VAE module is eliminated, and finally, anomalies are detected by approximating the density using the GMM module.
• Without GAN (WOGAN): Adversarial training component is removed, and dual variational autoencoders are used for the training, and anomalies are detected using the GMM module.
• Anomaly Detection with GAN (ADGAN) instead of GMM: We drop the GMM module and, by following the AnoGAN approach [68], the anomaly score is calculated VOLUME 10, 2022 as the linear combination of reconstruction error and discriminator error. Table 7 shows the results of the ablation study on all the datasets. We find that DVAEGMM achieves the best results. The efficiency of the DVAEGMM is demonstrated by the poor results of the WOGMM, WOSVAE, WOAVAE, and ADGAN. In addition, the model's performance degrades when we rely solely on the structure or attribute features. One potential reason for this is that considering only the attribute or structure information compromises the attributed network data integrity. Therefore, anomaly detection on an attributed network necessitates both structure information and attribute information.

VII. CONCLUSION
This research proposes a Dual variational Autoencoder with Gaussian Mixture Model (DVAEGMM) framework to solve the issue of anomaly detection in attributed networks. In contrast to prior techniques, DVAEGMM effectively addresses the shortcomings of previously proposed methods. Dual variational autoencoders address the complicated cross-modality connections between network structure and node attributes while incorporating the prospective distribution of data, thus reflecting the sparsity and nonlinearity of networks. GAN provides the adversarial power to the dual variational autoencoders. The Gaussian Mixture Model (GMM) is then applied to density estimation problems for input data with complex structures over the learned low-dimensional space. The sample energy is used to identify anomalies. Two datasets without ground truth anomalies, BlogCatalog and Flickr, and two datasets with ground truth anomalies, Enron and Amazon, were evaluated for the comparison. The results of the experiments show that DVAEGMM is a viable alternative to the approaches that had previously been offered. The performance of our proposed method is higher than baselines, i.e., it outperforms LOF by 40.23%, Radar by 14.83%, Dominant by 11.25%, Dual-SVDAE by 8.97%, and ResGCN by 2.68%, respectively on AUC for the BlogCatalog dataset. Each DVAEGMM component's efficiency is demonstrated via ablation analysis. Our suggested model, however, needs to be tested in real-world large-scale operational network scenarios before it can be used in the real world. The performance of our model is slightly worse in the datasets Enron and Amazon than in the other datasets. We speculate that this is most likely caused by the low dimensionality of the dataset. In the future, we will try to incorporate changes in our proposed framework to perform efficiently on datasets with low dimension. We also plan to explore DVAEGMM extensions for dynamic or time-series networks. The detection of anomalies in more complicated networks and graphs, such as heterogeneous graphs, spatial-temporal graphs, and dynamic graphs, will be one of our research priorities. MOHAMMAD KAMRUL HASAN (Senior Member, IEEE) received the Doctor of Philosophy (Ph.D.) degree in electrical and communication engineering from the Faculty of Engineering, International Islamic University, Malaysia, in 2016. He is currently working with the Center for Cyber Security, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia (UKM), as a Senior Lecturer. He is specialized with elements pertaining to cutting-edge information centric networks, computer networks, data communication and security, mobile network and privacy protection, cyber-physical systems, the Industrial IoT, transparent AI, and electric vehicles networks. He has published more than 150 indexed papers in ranked journals and conference proceedings. He is a member of Institution of Engineering and Technology (MIET 1100572830) and a member of Internet Society (198312). He is also a Certified Professional Technologist (P.Tech./Ts.), Malaysia. He also served the IEEE Student Branch as the Chair, from 2014 to 2016. He has actively participated in many events/workshops/trainings for the IEEE and IEEE humanity programs in Malaysia. He works as the editorial member in many prestigious high-impact journals, such as IEEE, IET, Elsevier, Frontier, and MDPI, and the general chair, the co-chair, and a speaker for conferences and workshops for the shake of society and academy knowledge building and sharing and learning. He has been contributing and working as a Volunteer for Under Privileged Children for the Welfare of Society.