Applications of Generative Adversarial Networks in Anomaly Detection: A Systematic Literature Review

Anomaly detection has become an indispensable tool for modern society, applied in a wide range of applications, from detecting fraudulent transactions to malignant brain tumors. Over time, many anomaly detection techniques have been introduced. However, in general, they all suffer from the same problem: lack of data that represents anomalous behaviour. As anomalous behaviour is usually costly (or dangerous) for a system, it is difficult to gather enough data that represents such behaviour. This, in turn, makes it difficult to develop and evaluate anomaly detection techniques. Recently, generative adversarial networks (GANs) have attracted much attention in anomaly detection research, due to their unique ability to generate new data. In this paper, we present a systematic review of the literature in this area, covering 128 papers. The goal of this review paper is to analyze the relation between anomaly detection techniques and types of GANs, to identify the most common application domains for GAN-assisted and GAN-based anomaly detection, and to assemble information on datasets and performance metrics used to assess them. Our study helps researchers and practitioners to find the most suitable GAN-assisted anomaly detection technique for their application. In addition, we present a research roadmap for future studies in this area. In summary, GANs are used in anomaly detection to address the problem of insufficient amount of data for the anomalous behaviour, either through data augmentation or representation learning. The most commonly used GAN architectures are DCGANs, standard GANs, and cGANs. The primary application domains include medicine, surveillance and intrusion detection.


Introduction
In modern society, many systems depend on and generate enormous amounts of data.This data is important for many decision-making processes related to these systems.Normally, systems operate under the expected conditions.However, in rare cases, anomalies may occur.Such anomalies can have a disastrous impact on the system itself or on its environment.Therefore, to lower the impact, it is important to be able to detect such anomalies as early as possible.For example, cancer is an anomaly in human tissue.Breast cancer is the second leading cause of cancer death in women [1].According to a recent study by the American Cancer Society [2], breast cancer alone accounts for 30% of female cancers.Early detection and treatment of breast cancer would highly increase the chance of survival [3].Similarly, with an increasing need to ensure public safety in crowded areas, development of real-time video surveillance systems becomes unavoidable.It is critical to seamlessly monitor the crowd to immediately detect anomalous (or abnormal) movements to help prevent theft [4], vandalism [5], and terrorist attacks [6].
The process of finding the anomalous behaviour of a system is referred to as anomaly detection.The primary objective of anomaly detection is to differentiate between the expected and unexpected behaviour of a system.Considering the importance of anomaly detection, it has received widespread attention in research.Despite the progress in this research area, there is still an important open challenge: the acquisition of data about anomalies that can be used to test anomaly detection techniques.
A recent trend in anomaly detection is the use of generative adversarial networks (GANs).Proposed by Ian Goodfellow et al. [7] in 2014, GANs are a type of unsupervised generative model which gained much attention from the research community.A well-trained GAN can generate realistic-looking data by sampling from a learned data distribution.A GAN consists of a generator and a discriminator model.These two models are pitted against each other in a two-player zero-sum game situation, iteratively improving their capabilities to generate and discriminate data.
The ability of GANs to generate data makes them attractive for anomaly detection research from two perspectives.First, they can potentially help generate the hard-to-acquire anomalous data points.Second, they can be used to learn the distribution of the data for the normal operating condition of a system and act as an anomaly or outlier detector.
In this paper, we conduct a systematic literature review of the applications of GANs for anomaly detection.We address the following research questions (RQs): • RQ1: What is the role of GANs in anomaly detection?We identified two roles that GANs play in anomaly detection: data augmentation and representation learning.In contrast to the remarkable ability of GANs to generate realistic-looking data, most of the reviewed papers use them for representation learning rather than data augmentation.The reason for this inclination is that, despite the improvement in the anomaly detection accuracy after data augmentation, the reported improvements are not substantial.When GANs are used for data augmentation in anomaly detection, we refer to it as GAN-assisted anomaly detection.The other role of GANs in anomaly detection is representation learning.In this case, the examined papers use the data from the normal class for training a GAN to learn the distribution of the normal data.A score is assigned to the new data by defining a score function, and the anomalous data in the test stage is identified based on a specific threshold.We refer to these techniques as GAN-based anomaly detection.• RQ2: What are the application domains of anomaly detection with GANs?The primary application areas where GANs are used for anomaly detection are medicine (19%), surveillance (15%) and intrusion detection (13%).• RQ3: Which GAN architecture is used most often in anomaly detection systems?We identified 21 architectures of GANs that are used for anomaly detection.Among these architectures, deep convolutional GANs (DCGANs) (32%), standard GANs (23%), and conditional GANs (16%) are the most commonly used.• RQ4: Which type of data instance and datasets are most commonly used for anomaly detection with GANs? 50% of the proposed GAN-based anomaly detection techniques use image datasets for anomaly detection purposes.Before being fed to the anomaly detection algorithms, the data are usually preprocessed.The most common preprocessing methods are resizing (23%), normalization (19%), and cropping (13%).• RQ5: Which metrics are used to evaluate the performance of GANs in generating data and anomaly detection?Only 21% of the studied papers evaluated the GAN's performance in generating synthetic data, either in data augmentation or representation learning.Structural similarity indices (SSIM) (26%) and peak signal-to-noise ratio (PSNR) (26%) are the most commonly used metrics.Visual inspection to evaluate the quality of the generated data was reported in 5% of the studied papers.To evaluate performance GANs in anomaly detection applications, 53% of the primary studies used the area under the receiver operating characteristic curve (AUROC).• RQ6: Which anomaly detection techniques are used along with GANs?GAN-based anomaly detection is mostly done in a semi-supervised manner.DCGANs and standard GANs are the most popular architectures in semi-supervised anomaly detection using GANs.In supervised learning based anomaly detection, GANs are used to augment the dataset for the anomalous class.However, the studied papers report only minor improvements in the performance of anomaly detection techniques after augmenting the dataset with GANs.Only a few primary studies focused on pure unsupervised anomaly detection based on GANs, most using the standard version of GANs.Similar to semi-supervised techniques, unsupervised GAN-based anomaly detection techniques are mostly compared with autoencoder-based approaches.
The findings presented in this survey will help researchers and practitioners to find the most suitable GAN-based anomaly detection techniques for their applications.The rest of this paper is organized as follows.Section 2 provides a brief introduction to GANs.Section 3 describes the methodology used for conducting this systematic literature review.Section 4 presents the results of the review.Section 5 discusses the open challenges and provides directions for future research.Section 6 identifies the threats to validity of the review, and Section 7 concludes the paper.

Generative Adversarial Networks
In 2014, Goodfellow et al. [7] introduced a framework for estimating generative models based on an adversarial process.This framework consists of two deep neural network-based models: a generative model G and a discriminator model D.
Model G learns the training data distribution and uses it to generate new samples.Model D determines whether a sample comes from the training data or was generated by the generative model.The power of GANs comes from the adversarial process, in which the two models are competing against each other to improve their accuracy in the designated task.
The diagram in Figure 1 shows the building blocks of a GAN [8]: • The Real Data (X), or the training dataset, contains the instances that the generator G should learn to generate, usually in the form of a batch.• Random Noise Vector (z) is the raw input to the generator.It is a vector of random numbers which the generator uses to generate fake examples.• The Generator model (G) is trained to learn the distribution of the input data.This model uses the input (z) to generate fake examples (G(z)) that are indistinguishable from the real data.• The Discriminator model (D) tries to distinguish the data that is generated by the generator from the real data.
The inputs to this model are the real data (X) and the generated data (G(z)).The output of this model is a binary decision for each data instance, i.e. real/fake.• Iterative Training: The GAN is trained using the classification error of the discriminator.The error is used to tune the parameters (weights and biases) of the discriminator, and then the parameters of the generator.Backpropagation [9] is commonly used as the training algorithm.This iterative training consists of two loops: -An inner loop where the discriminator's parameters are tuned to maximize the classification accuracy of predicting correct labels for real data and generated data.-An outer loop where the generator's parameters are tuned to generate data that has a minimal chance of being distinguished from the real data by the discriminator.
The adversarial training of the generator and the discriminator model is a zero-sum game problem: when one model gets better the other one gets worse in equal proportions [8].For all zero-sum games, there is a point where neither of the players can improve their situation.This point is referred to as the Nash equilibrium.The goal of a GAN is to reach this equilibrium, as then the fake data produced by the generator model is indistinguishable from the real data by the discriminator model.The output of the discriminator is then a random guess on whether the input data is real or fake.

Methodology
The planning, conducting, and reporting of this systematic literature review (SLR) were based on the guidelines proposed by Kitchenham [10].The planning stage of the SLR includes three steps: identification of the need for the

Planning Stage Conducting Stage Reporting Stage
Step 1: Identify the need for a systematic review Step 2: Develop the review protocol Step 3: Evaluate the review protocol

Start Systematic Review
Step 4: Search for primary studies Step 5: Select the primary studies Step 6: Extract data from primary studies Step 7: Synthesize the data Step 8: Report the data End of the Systematic Review Figure 2: The steps of our systematic literature review, based on Kitchenham's guidelines [10].
systematic review, development of the review protocol, and evaluation of the protocol [10].In the conducting stage, based on the review protocol that was developed during planning stage, we search for and select the primary studies, extract data from the primary studies, and synthesize the data.The set of primary studies contains all individual studies that contribute to the SLR [10].In the last stage, we conclude the systematic review by reporting the collected data and findings.Figure 2 summarizes the required steps for each stage of the review.In the following, each step is explained in more detail.

The Need for a Systematic Review
Recently, GANs have become a hot research topic in many application domains.One of these domains is anomaly detection.The ability of GANs to generate realistic looking data and to perform representation learning makes them attractive for anomaly detection research.Basically, GANs are trained in an unsupervised manner to learn the distribution of the data.However, they are highly flexible and can be used in semi-supervised fashion as well (e.g.[11]).
In addition, GANs are implicit density models which do not require any explicit hypothesis on the distribution of the data [12].Considering all these advantages, GANs can be leveraged to address some existing problems in anomaly detection, such as the lack of a sufficient amount of data for anomalous behaviour of the system.Therefore, a study summarizing existing research on applications of GANs in anomaly detection would be of a high value to the research community.
When we started this systematic literature review, we identified only one survey paper [13] reviewing applications of GANs in anomaly detection.However, this paper only covered 11 papers on anomaly detection with GANs.In addition, it did not follow a systematic approach to conducting the review.This confirmed the need for a systematic literature review on applications of GANs in anomaly detection, which covers a vast number of papers.To reduce researcher bias [10], we followed a systematic approach to designing, executing and reporting our findings.

Developing the Review Protocol
To reduce the possibility of researcher bias in a systematic manner, a review protocol is required to specify the method for conducting the systematic review [10].This protocol includes definition of the following elements: 1) research questions, 2) search strategy, 3) study selection criteria (including study quality assessment), 4) data extraction strategy, and 5) synthesis of the extracted data.

Our Research Questions
In this systematic literature review, we address the following research questions (RQs): 1. RQ1: What is the role of GANs in anomaly detection?(Section 4.1) Motivation: It is important to learn how GANs are used in anomaly detection.One intuitive way is to generate anomalous data to address the problem of the imbalanced dataset.Still, there might be more opportunities.Moreover, we will investigate what are the alternative, non-GAN approaches to handle these identified roles.2. RQ2: What are the application domains of anomaly detection with GANs? (Section 4.2) Motivation: The use of GANs in anomaly detection may be more common in certain domains.Here, we look into which domains and which types of GANs work together well.
3. RQ3: Which GAN architecture is used most often in anomaly detection systems?(Section 4.3) Motivation: There exist many architectures of GANs.Each one attempts to handle a specific type of data or to address an existing problem in the previous architectures.Some architectures may be better suitable for anomaly detection than others.Therefore, we look into which architectures of GANs are commonly used.
4. RQ4: Which type of data instance and datasets are most commonly used for anomaly detection with GANs? (Section 4.4) Motivation: Identifying which datasets are used to evaluate anomaly detection with GANs in certain domains can reveal the "standard benchmarks" in specific domains and which domains require benchmarks in general.
5. RQ5: Which metrics are used to evaluate the performance of GANs in generating data and anomaly detection?(Section 4.5) Motivation: Evaluating GANs in anomaly detection systems is not a straightforward task as their goal is to create realistic looking data that is different enough from known anomalies, yet still representative of real anomalies.Therefore, one cannot just compare the generated data with the real data.We study which approaches are commonly used to evaluate the quality of the generated data, and support practitioners in deciding which metrics to use for evaluating data in specific anomaly detection problems.
6. RQ6: Which anomaly detection techniques are used along with GANs? (Section 4.6) Motivation: GANs are often used together with more traditional anomaly detection techniques, especially when they are used in a supervised manner.In this question, we identify the anomaly detection techniques that are based on or assisted by GANs.

Search Strategy
To find relevant papers for this systematic review, we searched the IEEE Xplore1 , ACM Digital Library2 , Science Direct3 and Scopus4 digital libraries.The focus of this study is the application of GANs in anomaly detection.Therefore, we combined the keywords related to anomaly detection with keywords and abbreviations for generative adversarial networks.To find closely relevant papers for this study, we searched the title and the abstract of the papers for the following query: ("anomaly" OR "anomalies" OR "anomalous" OR "outlier" OR "abnormal") AND ("generative adversarial network" OR "generative adversarial networks" OR "GAN" OR "GANs").The list of primary studies was collected on 3rd June, 2020.
We conducted a pilot study to ensure that the well-known primary studies were included in the query results.During this study, we searched for the matched papers and their shared references on Google Scholar to ensure that the most cited papers were covered by the query.After several iterations of improvements of the query, we were confident that it returned the important and well-known studies.

Study Selection Criteria
We defined the following criteria for the inclusion of a paper in our study.The last two criteria in the list are included to assess the quality of the study.
• The paper must be in the specified digital libraries.
• The primary study should focus on anomaly detection while leveraging GANs.
• The developed methods should be evaluated on at least one real dataset, not only on simulated data, to ensure practical relevance of the study.
• The primary study should be available online to ensure accessibility.
• The article should be written in the English language.
All types of papers, including journal, conference, workshop, and symposium papers are considered in this review.The procedure for searching (described in Section 3.2.2) and selecting the primary studies is shown in Figure 3.The final list of papers used for data extraction and synthesis consists of 128 primary studies.Not every selected primary study provides answers to all six research questions. Figure 3: The procedure for searching, selecting, and extracting data from the primary studies for conducting the SLR.

Data Extraction Strategy
To facilitate the data extraction, we devised a data extraction form to collect the required information for each RQ from the primary studies.This form (shown in Table 1) was refined through several iterations with randomly selected papers on the subject.This refinement was accomplished by comparing the data extraction forms of the two first authors and addressing the potential ambiguities in the data extraction form.

Data Synthesis
During the data synthesis step, we aggregate the collected data from the data extraction forms to answer the research questions.Putting all this data together gives invaluable information concerning the current best practices and architectures for anomaly detection with GANs.

Evaluation of the Review Protocol
The protocol is a critical part of the SLR.It was evaluated by the last two authors and, after several iterations, the final version of the protocol was approved and used throughout the conducting stage of the SLR.The application domain(s): RQ3: Which GAN architecture is used most often in anomaly detection systems?
The GAN architecture(s) used in the study: RQ4: Which type of data instance and datasets are most commonly used for anomaly detection with GANs?
The type of input data to the GAN (main input): (e.g.image, text, etc.) The preprocessing technique used on the input data: Datasets that are used for the study: Usage of the dataset: (e.g., addressing the unbalanced dataset, training on normal/abnormal, etc.) RQ5: Which metrics are used to evaluate the performance of GANs in generating data and anomaly detection?
The type of performance metrics used for evaluating the performance of GAN: RQ6: Which anomaly detection techniques are used along with GANs?

Conducting the SLR
The conducting stage of the SLR includes the following four steps: searching for primary studies, selecting the primary studies, extracting the primary studies, and synthesizing the data.
We identified 362 papers that matched our search query (see Section 3.2.2):49 papers in Science Direct, 24 in ACM Digital Library, 104 in IEEE Xplore, and 185 in Scopus.We organized the papers for further analysis using Mendeley as a reference manager.
We filtered out irrelevant and duplicated papers according to the study selection criteria introduced in Section 3.2.3.Figure 3 shows the procedure for selecting the primary studies.We filtered the papers in two steps.In the first step, the first two authors independently read the abstract and the title of the primary studies and decided if they were related to anomaly detection with GANs.The Cohen's kappa coefficient [14] for this binary classification (relevant vs. irrelevant) was 0.83, which shows a satisfactory agreement between the researchers.There were 32 papers from the initial list of papers on which the first two authors disagreed.For those papers, the third author was asked to make the final decision regarding inclusion or exclusion.After the first phase of filtering papers, we ended up with 145 papers.The second filtering step was performed while reading the full text: the first two authors decided to include or exclude the paper in the data extraction step.In this phase, the first two authors made the same decision regarding the excluded papers and excluded 17 papers.Finally, 128 primary studies were included and analysed in this SLR.Based on the data extraction strategy introduced in Section 3.2.4,we examined these 128 primary studies to collect the data that contributes to addressing the RQs of this SLR.The primary studies were randomly divided into 10 batches.For each batch, the first two authors extracted the data from the primary studies and filled in the data extraction forms.After extracting the data from each batch, the data extraction forms were randomly distributed between the first two authors, and then the disagreements were identified and discussed in a meeting.If they failed to reach a consensus, one of the last two authors made the final decision.After extracting data from all primary studies and addressing the discrepancies in the data extraction forms, we created a spreadsheet for each data extraction form.We summarize and report the extracted data in the following section.

Results
This systematic literature review covers 128 primary studies that describe applications of GANs in anomaly detection.As shown in Figure 4a these primary studies were published between 2017 and early 2020.The number of studies per year is increasing, suggesting that interest in this research area is growing rapidly.Figure 4b shows that the majority of the reviewed papers (63%) appeared in conference proceedings, 29% of the papers were published in journals, and 4% in workshops and 4% in symposia.
The main publication venues include IEEE Access, IEEE International Symposium on Biomedical Imaging (ISBI), IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), and IEEE Conference on   Computer Vision and Pattern Recognition.We found 108 different venues, which shows that the primary studies are not concentrated in a single specific journal or conference.

RQ1: What is the role of GANs in anomaly detection?
We identified four types of GAN applications in anomaly detection: (1) generating abnormal data instances, (2) generating normal and abnormal instances, (3) learning the normal behaviour of a system, (4) learning both normal and abnormal behaviour of a system.Applications 1 and 2 can be classified as data augmentation with GANs and 3 and 4 as representation learning with GANs.
Generative models, such as GANs, are mainly designed for data augmentation, i.e., to generate new data and use it to augment the existing data.They can also be used for representation learning, i.e., to learn representations of the data to support information extraction for use when building classifiers or other predictors [15].In this case, the generator and the discriminator of a GAN can be used to learn the distribution of a specific class of data, i.e. normal or abnormal data.In turn, the learned distribution can be used to identify nonconforming or irregular data.Table 2 shows the two main roles of GANs in anomaly detection, along with the types of data used in each role.Most of the primary studies opted to use GANs to learn the representation of the data rather than augmenting the datasets.Moreover, this representation learning is mainly performed on normal data.The rationale behind this preference is that, due to the data imbalance, it is usually easier to learn a model of normality rather than abnormality.In addition, by learning only the normal data distribution, the need for data from the abnormal condition of the target system is eliminated.

Representation Learning With Generative Adversarial Networks
The main goal of GANs is to learn a generative model that produces realistic-looking data by sampling from the learned distribution.This generative power of GANs was highlighted by Goodfellow et al. [7] and Radford et al. [144].Representation learning with GANs for anomaly detection exploits the ability of GANs in learning the distribution of a specific class of data.Several anomaly detection techniques are proposed that use this representation learning ability of GANs (shown in Table 3).We will explain the concept of anomaly detection using representation learning through an examples of a well-known GAN-based anomaly detection techniques (AnoGAN).All other anomaly detection techniques that rely on representation learning through a GAN are variations to some extent of the AnoGAN technique.
Schlegl et al. [104] introduced the first GAN-based anomaly detection technique, called AnoGAN, taking advantage of the representation learning ability of GANs .Schlegl et al. put forth AnoGAN, which employs the DCGAN architecture, to learn the distribution of normal anatomical variability.The idea comes from the concept of a smooth transition in the latent space of the data, i.e., that sampling from two close points in the latent space should lead to similar data in the data space [145].Schlegl et al. hypothesize that the latent vector of the GANs represents the distribution of the trained data.Therefore, one can learn the representation of the normal data by training GANs only on normal data.From an anomaly detection view, learning the representation of the normal data is useful as one can decide for new (potentially anomalous) data points how likely they are part of that normal data.During the training of a GAN, the generator learns the mapping from latent space to data space G(z) = z → x (i.e., the representation of the data).However, the inverse mapping, which is necessary to decide whether a data point is anomalous, is not straightforward to obtain [104].To address this problem, Schlegl et al. proposed an additional step after training the GAN on normal data.For an image x, they proposed to find a point z in the latent space that corresponds to an image G(z), which is the most similar to the image x on the learned manifold χ.Schlegel et al. proposed an iterative process to find the most similar image G(z Γ ) to x using residual and discrimination loss.The similarity of images x and G(z) depends on how closely x follows the distribution of the data learned by the generator (p g ).After identifying the most similar image, AnoGAN computes an anomaly score that is related to the similarity of x and G(z).Finally, based on a threshold for the anomaly score, AnoGAN decides whether x is an anomaly.

Data Augmentation with Generative Adversarial Networks
Machine learning techniques, especially deep learning methods [146], require a massive amount of data to perform well in their designated task [147].Data augmentation, also known as oversampling, is carried out to compensate for an insufficient amount of data in the dataset to prevent model overfitting.It can also be used to address the problem of data imbalance, which occurs when the sizes of the classes in a dataset differ considerably.For instance, in a binary classification task, the class with fewer samples is called the minority class, and the other class is called the majority class.The corresponding training process would be biased towards the majority class, hence a classifier trained using  [110,111,123,139,131,133,134] Standard GAN [112,113,114,119,120,128,130] Cycle-GAN [117,124,137] WGAN [126] this dataset would have a better accuracy for this class [148].To address the imbalanced dataset problem, one can either randomly remove samples from the majority class to balance the class size (undersampling), or augment the minority class by adding artificially generated instances (oversampling) using suitable techniques.
The problem of the imbalanced dataset is more critical in anomaly detection since it is hard and expensive to collect data on anomalous behaviour of the system under study.Often, there are very few or no examples of anomalous data available.In this situation, GANs can help by generating more samples for the anomalous class.
Table 2 summarizes the main roles of GANs in anomaly detection and the type of data used for that purpose.Data augmentation with GANs is mostly used to generate data that represents anomalous behaviour of the system.There was no primary study augmenting only the normal condition of the system under study for anomaly detection.This is due to the fact that there is usually an abundance of data for the normal condition.However, some studies augmented both normal and abnormal data, e.g., using CycleGANs, by learning the transformation from abnormal to normal and from normal to abnormal to generate new data.After augmenting the dataset, it is ready to be used for anomaly detection, usually performed by a classifier (as discussed in Section 4.6).Most primary studies that use GANs for data augmentation report a slight improvement in classification accuracy compared either to traditional techniques or without data augmentation.For instance, Madani et al. [135] report that, using data augmentation with GANs, the test accuracy for cardiovascular abnormality detection improved from 81.93% to 84.19%.In comparison, using traditional augmentation methods, they achieved only 83.12% test accuracy.This improvement is significant when dealing with large amounts of data, especially in medical applications.However, in some studies, it has been reported that GAN did not meet their expectations in improving the classification accuracy after augmenting the data, e.g.[141].The list of different GANs used for data augmentation is shown in Table 4.
In the examined primary studies, we identified several traditional techniques for addressing the problem of imbalanced and insufficient amounts of data.The effects of adopting these techniques are compared to the GANs in terms of improving the classification accuracy.For example, random undersampling was evaluated in two primary studies [110,126], where samples were randomly removed from the majority class.Using this approach, some important and critical data may be lost that could otherwise be beneficial for learning a robust decision boundary [149].Random oversampling was investigated in four primary studies [110,111,53,126].In this case, some samples from the minority class are copied to increase its size.However, this approach is likely to cause over-fitting [126].All these studies confirmed the superiority of GANs in data augmentation compared to random over/undersampling.
Chawla et al. [150] proposed synthetic minority oversampling (SMOTE) to improve the random oversampling by synthesizing new samples from the neighbourhood of the minority class samples.This improvement is accomplished by interpolating between several minority class instances.SMOTE and its variants (e.g., borderline-SMOTE [151]) were compared with GANs in several studies [110,111,117,53,126,132].The ADAptive SYNthetic (ADASYN) sampling approach for imbalanced datasets [152] was compared with GANs in two studies [53,126].ADASYN uses a weighted distribution for different minority class instances based on their difficulty level, i.e., the more difficult to learn instances are synthesized more frequently.In addition, several other traditional techniques for data augmentation, such as adding Gaussian noise to the dataset [118], Gaussian smoothing, unsharp masking, minimum filtering [138], and affine transforms [141] were compared to GANs.Most of these studies show that data augmentation with GANs results in training datasets that improve the anomaly detection.

RQ2: What are the application domains of anomaly detection with GANs?
Table 5 shows the different domains where GANs were applied in the primary studies.The table reveals that a vast number of primary studies (24 papers) perform anomaly detection in medical applications, closely followed by surveillance and intrusion detection with 19 and 17 papers, respectively.Medical Anomaly Detection.Anomaly detection in medicine deals with analyzing patients' health conditions using medical records and images [153].Specific applications include retinal optical coherence tomography (OCT) anomaly detection [17,26,29,104], seizure detection [18], cardiovascular disease detection [30], lung nodule detection [42], abnormal chest X-ray identification [59,138,97,135], polyp detection [123,80], metastatic bone tumor detection [78], lesion detection [137,101], laparoscopy anomaly detection [85], breast cancer detection [143,132], MRI quality control [98], diabetic retinopathy detection [133], brain tumor detection [134] and hemorrhage detection [105].One of the challenges in this domain is the difficulty of obtaining expert labels for medical data, such as clinical images, since annotation is an exhaustive and time-consuming task.
Surveillance Anomaly Detection.To improve public safety, surveillance cameras are widely used in public places such as streets, stores, and banks.The goal of video surveillance is to identify suspicious activity, unusual traffic patterns, or accidents by automatically analyzing the behaviour of the surveillance target.In video surveillance anomaly detection, this can be accomplished by identifying the out-of-ordinary behaviours that differ from dominant (normal) behaviours in the scene [153].Automated video surveillance can reduce the dependence on human workers and reduce the risk of late detection of anomalous behaviour.Most primary studies in this application domain leverage GANs for video anomaly detection to find irregularities in the crowds.However, traffic anomaly detection [24] and threat object recognition with X-ray imaging [91] have also been studied.
Intrusion Detection.Intrusion detection systems are defined as software and/or hardware components that monitor and analyze events in computer systems to identify signs of intrusion [154].Any malicious intrusion or attack on network vulnerabilities, computers or information systems may result in a serious predicament and violate the confidentiality, integrity and availability of the systems [155].The examined primary studies are mainly focused on network intrusion detection [110,117,119,121,106,128,129,83,87,131].However, other applications of GANs in intrusion detection are smartphone lock pattern intrusion detection [112], presentation attack detection [82], phishing detection [114], cognitive radio intrusion detection [76], cyber-physical system intrusion detection [93], and IoT security [19,46].
General Approaches.Some primary studies do not focus on a single application domain.Instead, they evaluate the proposed approaches in different application domains (shown as Various in Table 5).For example, three primary studies [38,50,51] investigate their proposed GAN-based anomaly detection for intrusion detection and image recognition.Two evaluated studies [79,107] apply GAN-based anomaly detection in image recognition and video surveillance applications.Other primary studies evaluate their anomaly detection approach in intrusion detection, medical and image recognition domains [62,71].Khoshnevisian et al. [52] investigate the application of their proposed GAN-based anomaly detection in medicine and on trajectory anomaly detection.Hyuk et al. [111] evaluate their proposed technique for image recognition in addition to medical and trajectory anomaly detection.Wang et al. [67] study the application of GANs in fraud and intrusion detection, and Liu et al. [120] evaluate their approach in medicine, image recognition, aviation, human activity, spam identification, and waveform anomaly detection.
System Health Anomaly Detection.System health monitoring is a way to identify anomalous behaviour in large (often industrial) systems.In industrial processes, the anomalous behaviour can represent, for example, wear or damage to the industrial equipment after continuous use.It is critical that such degradations in a system's performance are detected before they escalate and cause loss of revenue or endanger human life.Examples of industrial applications of system health anomaly detection with GANs include industrial process anomaly detection [16,28], electrical insulator anomaly detection [116], rolling bearing anomaly detection [53], steam turbine anomaly detection [58], magnetic flux leakage detection [139], fused magnesium furnace anomaly detection [125], railway turnout anomaly detection [92] and communication system anomaly detection [100].
Image Recognition.Image anomaly detection refers to finding images with abnormal patterns that do not comply with other images in the same set.Most primary studies in this application domain use public image datasets, such as MNIST or CIFAR-10, to prove the concept of their proposed anomaly detection techniques.However, Moussa et al. [136] evaluate the application of GANs for object recognition in images, such as finding an airplane in the picture.Bergmann et al. [66] propose a dataset of high-resolution color images of different object and texture categories suitable for anomaly detection.They evaluate several anomaly detection techniques, including GANs, to process their dataset.The proposed dataset aims to provide more challenging images than the commonly used datasets mentioned above.
Manufacturing Anomaly Detection.This anomaly detection application refers to the quality inspection of manufactured products to identify defective products.These defects reveal themselves as irregularities on metal or wood surfaces, electronic parts, and so on.For example, an application of visual surface defect detection is studied in four primary studies [33,44,124,69] and industrial quality inspection is investigated in three studies [40,77,99].
Anomaly Detection in Autonomous Systems.Autonomy is defined as self-governance or freedom from external influences [156].An autonomous system is referred to as a system that can perceive the environment, make decisions based on the sensed information, and then react to internal/external changes using actuators.However, a fault may occur in each of these steps.For example, in an autonomous robot, faults can occur in sensors, software, or after physical damage to the actuator.This domain includes driving anomaly detection [23,118,84] to assist the driver or to identify abnormalities in the driver's behaviour.Autonomous surveillance with moving agents is addressed in three primary studies [31,72], where an autonomous moving agent, such as a patrol robot, scans the environment to find abnormal activity.Two primary studies focused on controller anomaly detection [56,73], to identify abnormal decision making by a controller in a closed-loop control system.Sun et al. [60] study autonomous vehicle anomaly detection.
Power/Energy Anomaly Detection.This application domain is concerned with identifying abnormalities in the power/energy consumption and power/energy infrastructure.Examples include catenary support component anomaly detection [32,35,49], power plant anomaly detection [48,127] and power consumption anomaly detection [102].
Fraud Detection.Fraud is defined as exploiting one's occupation for personal enrichment by willful misuse or application of their employer's resources or assets without authorization [157].Fraud detection refers to uncovering these illegal activities.Examples of applications in this domain include click advertisement fraud [113], stock market manipulation [45], credit card fraud [126], health care insurance providers fraud [130], and satellite image forgery [103].
Other Domains of Anomaly Detection with GANs.There are several additional application domains for anomaly detection using GANs that are less common: trajectory anomaly detection [115,61], human mobility anomaly detection [115], climate change [142], text anomaly detection [68], and software systems anomaly detection [94,34].

RQ3:
Which GAN architecture is used most often in anomaly detection systems?
Many types of GANs have been proposed to tackle the deficiencies of the first type of GAN proposed by Goodfellow et al. [7] or to handle specific tasks.In most cases, they modify the GAN architecture or the cost function of the generator and discriminator.According to the GAN Zoo GitHub repository 5 , more than 500 types of GAN were identified from 2014 to 2018.
We identified 21 different types of GANs used for anomaly detection purposes (see Table 6 for a list of primary studies using each of these architectures).DCGANs, standard GANs, and cGANs are the most commonly used GAN architectures.These were among the first proposed GAN architectures, and there are many new ones which are not (yet) used for anomaly detection purposes.The correspondence between the identified GAN architectures and their application domains is shown in Table 7. DCGANs, standard GANs, and BiGANs have been used in various application domains, indicating their flexibility.A variety of GAN architectures have also been used for applications in medicine, intrusion detection, and system health.However, some of the application domains are not well researched regarding GAN architectures, such as text anomaly detection and fraud detection.
Since the anomaly detection techniques examined in this review are either based on or assisted by GANs, any deficiency in the networks used for anomaly detection directly impacts the performance of the corresponding anomaly detection techniques.Therefore, the improvements in anomaly detection techniques using GANs are strongly correlated with the advances in the GAN architecture and training strategies.There are many studies in the literature describing the  challenges of existing GANs and available solutions [158,159,160,12].The most crucial problem with GANs is the problem of mode collapse.When this happens, the generator of a GAN always generates samples from a highly concentrated distribution (partial collapse) [12], or simply a single sample (complete collapse) [161,162].Therefore, the generated data lacks the expected diversity.There have been several treatments proposed to lessen the effect of mode collapse during GAN training, such as WGAN [163], and Unrolled GAN [164].Another challenge of training of GANs is the instability of the training process and its failure to converge to a Nash equilibrium.Methods proposed to address this problem include Two Time-Scale Update Rule (TTUR) [165], WGAN [163] and feature matching [164].
The architectures of the top five GAN variants are shown in Figure 5.We elaborate on the most used architectures, and what makes them suitable for anomaly detection in the following subsections.In addition, we discuss if and how they deal with the challenges mentioned above.

Standard Generative Adversarial Networks (GAN)
In the first GAN architecture, proposed by Goodfellow et al. [7], the generator and discriminator models are defined by fully connected multilayer perceptrons.For the generator model, to learn the distribution of the generator p g over data x, a prior on input noise variable p z (z) must be defined.This mapping is represented as G(z; θ g ), where G is a The Standard GAN optimizes the Jensen-Shannon (JS) divergence to learn the distribution of the data.Consequently, it suffers from an unstable, weak signal when the discriminator is approaching a local optimum, known as the problem of gradient vanishing [183].This can also lead to mode collapse.Another problem of the standard GAN is that it does not provide any inference model to directly capture the inverse mapping.Hence, further training is needed to attain this inference model, adding to the computational cost of the GAN training.Moreover, as standard GAN uses MLP in the generator and discriminator models, it is not suitable for high dimensional data such as images.This is because MLPs are fully connected networks that require optimization of many parameters.Therefore, more efficient GAN architectures (such as DCGANs) are preferred for images and other high-dimensional data.

Conditional Generative Adversarial Networks (cGAN)
Mirza et al. proposed the conditional GAN [166] as an extension to the standard GAN that can control what type of data is generated.For example, a condition can be specified to generate only data of a certain class or type.The conditional model of GAN can be obtained if both the generator and the discriminator are conditioned on some additional information y fed through additional input layers.There is no limitation on the type of the data; for example, it can contain class labels or data from different sources [166].The conditional data generation is advantageous for anomaly detection purposes since cGANs can better generate data from different sources, i.e. multimodal data generation, or it can be used for multimodal anomaly detection.

Deep Convolutional Generative Adversarial Networks (DCGAN)
Striving to bridge the rift between the success of Convolutional Neural Networks (CNNs) for supervised learning and unsupervised learning, Radford et al. [144] introduced DCGANs, which integrate convolutional neural networks into the standard GAN.DCGANs provide a better network topology for more stable GAN training.The optimization and training processes are the same as for the standard GANs.However, Radford et al. proposed several improvements to the CNNs and Standard GANs.These modifications are: (1) using all convolutional nets [184] in the generator and discriminator, (2) removing fully connected layers on top of the convolutional layer, and (3) using batch normalization [185].These changes result in a better model and training stability with deeper gradient flow through the network, preventing mode collapse.
DCGANs were originally designed for image processing since they employ CNNs.The CNNs allow DCGANs to learn a hierarchy of representations from object parts to scenes in both the generator and discriminator, which makes DCGANs well suited for image anomaly detection.

Bi-directional Generative Adversarial Networks (BiGAN)
The bi-directional GAN [167] adds an autoencoder that learns the mapping of data x to the latent representation z (inference), which makes it well suited for anomaly detection.BiGANs do not make any assumptions about the nature or structure of data.As a result, they provide a general, robust approach for unsupervised representation learning capable of capturing semantic attributes of the data [167].Donahue et al. empirically show that, despite their generality, BiGANs are competitive with the state-of-the-art approaches to perform self-supervised and weakly supervised feature learning tasks.Comparing BiGANs with the standard GAN, the inference mechanism,i.e.feature learning, of BiGANs makes it suitable for anomaly detection techniques since they can be immediately used to generate anomaly scores.

Wasserstein Generative Adversarial Networks (WGAN)
In an attempt to alleviate the problem of mode collapse and the challenges of standard GANs to converge to the Nash equilibrium, Arjovsky et al. [163] suggested using the Earth-Mover (EM) distance or Wasserstein-1 distance instead of JS divergence used in the standard GAN.Unlike DCGAN, WGAN attempts to enhance the stability of GANs by modifying the adversarial cost function.Arjovsky et al. show that these distances provide gradients that are more useful for updating the generator than the JS divergence function [163].Although WGAN better handles the problem of mode collapse compared to standard GANs and DCGANs, the weight clipping used in its discriminator made it difficult to converge.Gulrajani et al. [168] proposed an improved version WGAN-GP introducing a gradient penalty to the discriminator model of WGAN instead of weight clipping.This results in better convergence, training speed, and sample quality by forcing the discriminator to learn relatively smoother decision boundaries [160].This improved version of WGAN is already used by several studies for anomaly detection (see Table 6).
4.4 RQ4: Which type of data instance and datasets are most commonly used for anomaly detection with GANs?
We identified six types of input data used for anomaly detection with GANs.As shown in Table 8, image is by far the most common type, appearing in 50% of the examined papers.The two most common application domains for anomaly detection with GANs are related to images: medicine and surveillance.Tabular data is second (26%), followed by video, time series, text, and frequency data.
Data preprocessing is a key element that determines the success or failure of many deep learning models [110,112,41].
We identified 22 types of data preprocessing techniques, summarized in Table 9. Owing to images being the most common data type, resizing, normalization, and cropping are the most prevalent preprocessing techniques.These techniques make data more uniform by changing its range and scale.A normalized dataset also speeds up learning.
Preprocessing is commonly applied to image, tabular and time series data, but rarely to other types of data.It is also worth noting that some studies [117,76] first transform tabular or frequency data to images before applying other types of preprocessing techniques.Tables 10-a and 10-b show the datasets used in the primary studies as well as their associated application domains.The "custom dataset" stands for a dataset that was either constructed by the study authors or that contains proprietary data not released to the public domain.These tables show that the majority of the utilized datasets are custom, while UCSD pedestrian [186], MNIST [187], and CIFAR-10 [188] are the most commonly used publicly available datasets.
The UCSD anomaly detection dataset was acquired by a stationary camera that captures pedestrian walkways.It includes two subsets: Ped1 with 34 training and 36 testing video sequences, and Ped2 with 16 training and 12 testing video sequences.In the normal setting, the video in this dataset contains only pedestrians.Abnormal events occur when either nonpedestrian entities are in the walkway, or there are anomalous pedestrian motion patterns, such as people walking across a walkway or in the grass that surrounds it.This dataset is challenging due to the low-resolution images, different types of moving objects, and the presence of one or more anomalies in the scene.
The MNIST and CIFAR-10 datasets both appear in 7% of the examined studies.The MNIST database of handwritten digits has a training set of 60,000 examples and a test set of 10,000 examples.The CIFAR-10 dataset is a collection of 60,000 colour images arranged in 10 object classes of equal size.When MNIST and CIFAT-10 are used in anomaly detection studies, one class is simulated as abnormal and removed from the training class, while the remaining classes are treated as normal.
The UMN crowd dataset [189] is used in 4% of the examined papers.It contains normal and abnormal crowd behaviour captured in indoor and outdoor scenes of the University of Minnesota.The dataset contains 11 videos with a total of 7,736 frames that were captured under several scenarios at three different indoor and outdoor scenes.
From the perspective of application domain, we found that studies on fraud detection use the Credit Card Fraud Detection dataset, real world credit (RWC) dataset, TalkingData AdTracking, UCI dataset and a custom dataset.For surveillance anomaly detection purposes, 74% studies use the CUHK avenue, ShanghAaiTech, UCSD, and UMN datasets.Among all datasets identified in the primary studies, twelve are used for intrusion detection, seven for manufacturing anomaly detection, fifteen for medical anomaly detection, and twelve for image anomaly detection.While most datasets are not used across all domains, eighteen are used in multiple domains as highlighted in Table 10-a and Table 10-b.For instance, KDD-cup99 10% is used in both image recognition [50] and intrusion detection [51] while MNIST is used in image recognition [115,44,47,51,63,86,96,108] and manufacturing anomaly detection [71].
4.5 RQ5: Which metrics are used to evaluate the performance of GANs in generating data and anomaly detection?
We found that 27 out of 128 primary studies evaluated the quality of the generated samples using 9 different performance evaluation metrics.Most studies evaluated data quality quantitatively, while six papers implemented visual inspection to evaluate the quality of the generated samples [17,40,60,69,139,135].During the inspections, the generated samples were examined by application domain experts, or simply the authors of the individual studies.Quantitative evaluation was performed using eight performance metrics, most commonly the structural similarity index measure (SSIM) and peak signal-to-noise ratio (PSNR) that were each used in 26% of the studies that evaluated performance.
SSIM, adopted by seven papers [26,55,56,63,65,98,99], quantifies the relative perceptual similarity between two images.This metric ranges from -1 to 1, with 1 indicating a perfect pixel match between the original and generated samples, -1 corresponding to inverted images, and 0 marking no similarity [190].Seven papers [21,22,24,41,44,137,85] used PSNR as a metric to measure the quality of the generated images.This metric evaluates the similarity of two samples through the ratio of the total number of pixels divided and the mean squared error between the original and generated images.A higher value of the PSNR indicates that the generated sample is closer to the original.The Fréchet inception distance (FID), adopted by two papers [124,82], is a widely used evaluation method for evaluating the diversity and similarity of generated images [165].By calculating and comparing the feature vectors of a collection of real and generated images, FID can measure the distance between the real and generated distribution.
We also studied whether performance metrics were used with specific input data types.As shown in Table 11, we observed that most performance metrics have been applied to image and video data, while only two papers utilized metrics for time series data [53,60].It is also worth mentioning that frame data is usually extracted from the video before being fed into the GAN.Therefore, the performance metrics are essentially used only to evaluate the image data.
We also examined the relationship between the performance metrics and application domains.Table 12 shows that most metrics were adopted in surveillance anomaly detection to evaluate the generated samples, while most domains such as fraud detection, power/energy anomaly detection, and software systems did not evaluate the quality of the generated samples at all.In addition, SSIM, PSNR, FID, and visual inspection were used in various domains, while other metrics were only applied to one specific domain.

Techniques References
Support vector machines [110,113,114,118,119,136,125,126,127,129, 109] Neural network-based methods [110,113,117,121,124,125,106,139,129,91,140,141,143,134,109] Nearest neighbors [114,119,136,121,109] Naive Bayes [110,119,121,129] Ensemble methods [110,113,114,119,136,121,129,131, 109] Linear model [113,114,119,121,121,126] Decision trees [110,114,119,136,121,109] semi-supervised, and unsupervised anomaly detection.In addition, as there is a wide variety of anomaly detection techniques, we only considered the techniques used in more than three primary studies.

Supervised Anomaly Detection
Anomaly detection techniques trained in a supervised fashion assume that labelled data for the normal and anomaly classes is available.A training dataset is used to train the model for predicting the class labels, and then the predictive model is evaluated on an unseen test dataset.Dependence of the supervised anomaly detection techniques on the data labels makes them more vulnerable to the problem of an imbalanced dataset.Therefore, these anomaly detection techniques are biased toward classifying the majority class.
In 26.3% of the investigated primary studies, GANs are applied to address the problem of the imbalanced dataset for anomaly detection by augmenting it with the data generated by GANs.The list of primary studies that use GANs for data augmentation is shown in Table 2.The list of papers that used GANs along with supervised anomaly detection techniques, i.e.GAN-assisted approaches, is shown in Table 13.

Semi-Supervised Anomaly Detection
Anomaly detection techniques trained in a semi-supervised manner assume that the labelled data is available only for the normal class.The main benefit of semi-supervised anomaly detection techniques is that they do not require data for the anomalous class.In the reviewed primary studies, GANs are mostly used in a semi-supervised manner.By training a GAN to learn the distribution of the normal class, a deviation from the normal distribution is identified using an anomaly scoring technique.The list of GAN-based semi-supervised anomaly detection techniques is shown in Table 14.From all GAN-based techniques, AnoGAN [104] is the GAN-based technique most often used as a baseline for comparison with newly proposed methods.There are several other techniques that can be used for anomaly detection purposes in a semi-supervised manner, as listed in Table 15.The table shows that several papers used Mixture of Dynamic Texture (MDT) [191] as an anomaly detection technique in crowded scenes.Mehran et al. [189] use a generative probabilistic model called Social Force for semi-supervised anomaly detection.Two primary studies investigate sparse dictionary learning-based anomaly detection techniques, detection at 150 FPS [192] and sparse reconstruction [193].In the primary studies, the performance of different flavours of autoencoders (AEs) were compared to the performance of GANs in anomaly detection such as standard AEs [194], Variational AEs (VAEs), convolutional AEs (CVAEs) [195], Denoising AEs (DAEs) [196], and Adversarial AEs (AAEs) [174].Moreover, some of the primary studies compared the proposed anomaly detection techniques with a Long Short Term Memory (LSTM) based approach [197].

Unsupervised Anomaly Detection
These types of anomaly detection techniques do not require a labelled dataset.This is based on the central assumption that normal instances are far more frequent than anomalies in the test data [153].However, if this assumption is not valid, the anomaly detection will significantly suffer from false alarms.Assuming that the unlabeled dataset contains very few anomalous instances and the model is robust against these few anomalies, we can adapt a semi-supervised anomaly detection technique to work in an unsupervised manner by training the model on a portion of the unlabeled dataset.A list of GAN-based unsupervised anomaly detection techniques is presented in Table 16.In addition, Table 17 presents a list of unsupervised anomaly detection techniques that have been considered for anomaly detection and compared to GANs in the literature.
From Table 17, we can observe that one-class classifiers [198] have been of great interest from an unsupervised anomaly detection perspective.Isolation forest [199] is another unsupervised technique competing with GANs in this area.Several variants of principal component analysis [200] have also been compared often to GANs in terms of performance.Several linear models have also been applied, namely REAPER [201] and Low Rank Representation [202].Other  techniques compared to GANs for anomaly detection include Gaussian mixture models [203], R-graph [204], local outlier factor [205], deep structured energy-based models [206], and deep autoencoding Gaussian mixture models [207].

Future Research Directions
Generative adversarial network-based anomaly detection is in its early stage of development with many research opportunities.However, most of these opportunities lie in the field of GANs itself.In this section, we present possible directions for the future work of applying GANs in anomaly detection.
Future direction 1: Speeding up the GAN training process.Training GANs is a computationally demanding task.As reported in almost all primary studies, it takes a long time and powerful GPUs to train GANs to the point of satisfactory performance.Consequently, future studies need to explore GAN architectures that are lightweight and efficient in terms of resource consumption [22,28,127,89,83].For instance, the effects of selecting GAN hyperparameters on the anomaly detection performance have not been studied in the literature.There is also a need to consider the use of emerging GAN optimization and training methods, e.g.[161,164], for better training stability and faster convergence.
Future direction 2: Accounting for changing behaviour of a system.In most industrial anomaly detection applications, behaviour of the target system varies over time.Therefore, it is crucial to examine the temporal behaviour of the system to find anomalies.RQ4 showed that only 7% of the primary studies used GANs for anomaly detection in time series data.Therefore, more studies are required on anomaly detection using GANs for time series data to make them suitable for industrial applications, especially for multivariate time series data.In many industrial applications, data is collected online.Huang et al. [110] suggest to take advantage of this data via online training of GAN-based anomaly detection techniques.This approach might be adaptable for more real-time anomaly detection tasks.
Future direction 3: Improving support for multimodal, discrete and noisy data.Another open challenge of using GANs for anomaly detection is the lack of studies on multimodal anomaly detection using GANs.In real-world cases, data often comes from multiple sources with different types.For instance, Qiu et al. [23] propose a GAN-based driving anomaly detection technique using physiological and CAN-bus data.Qiu et al. suggest incorporating other information such as results obtained from vision-based object detection systems applied to the road.Many other GAN-based anomaly detection approaches could benefit from using multimodal data.In addition, GANs were initially created to generate continuous data.As a result, they have limited ability to deal with discrete data, as it hinders the backpropagation process [68].Ben Fadhel and Nyarko [68] point out this problem and study GAN architectures suitable for discrete data.Despite the promising results they report, this study is the only example of anomaly detection for discrete data in our review.Finally, Lei et al. [22] used optical flow as foreground shape information for video anomaly detection.Lei et al. point out that when the optical flow is inaccurate, it will affect the robustness of their proposed GAN-based anomaly detection technique.Thus, a potential direction for future research is the study of the effects of measurement inaccuracies and noise in the data on the performance of GAN-based anomaly detection techniques and the development of solutions to alleviate their impact.
Future direction 4: Searching for better anomaly scoring methods.As mentioned earlier, GAN-based anomaly detection techniques require an anomaly scoring method to distinguish between normal and abnormal samples.The selection of anomaly metrics for anomaly scores is still a challenging task.Further investigation is needed to improve the scoring methods and to identify the best fit for each application domain or for a specific case [28,63,99].
Future direction 5: Improving the performance evaluation of GANs.It is essential to evaluate the performance of GANs in generating data before using them for anomaly detection, either in a GAN-assisted or GAN-based setup.By doing so, one can ensure that GAN has learned the distribution of the data correctly.The results of RQ5 revealed that most primary studies do not evaluate the performance of their GAN-generated data.Additionally, almost all cases that assess data performance use image data.For other types of data, such as tabular, text and time series, there is no performance indicator of the generated data quality.Therefore, additional research is needed to identify the most suitable metrics for assessing the performance of GANs for each data type.
Future direction 6: Employing improved GAN architectures for anomaly detection.We observed in RQ3 that the 'older' GAN architectures are by far the most popular in anomaly detection studies.However, many improved GAN architectures were proposed recently, which could improve anomaly detection as well.For example, it is desirable to generate high-resolution images with GANs.However, it is a challenging task.High-resolution images make it easier for the discriminator to tell apart the generated images from the training samples [171].Also, high-resolution images require more memory storage, which leads to smaller minibatches and may compromise training stability [172].Several primary studies highlighted the need to generate high-resolution images for better anomaly detection, e.g.[55,67,89].Future studies may examine the effect of using improved GAN architectures, e.g., to improve the resolution of images using SRGAN [208], ESRGAN [209], or BigGAN [210], on the performance of anomaly detection techniques.

Threats to Validity
One threat to the validity of our review is that of missing papers.The source of this threat is selecting the search keywords and the search engines.To mitigate this threat, we iteratively added keywords to our search query until no relevant new papers were found.The list of papers was finalized on the 3rd of June, 2020, and no papers published afterwards were added.It is possible that there are new GAN-based anomaly detection techniques that address some of the issues or challenges identified in this review.Also, the data synthesis of the RQs was divided between the two first authors.To reduce bias in the data synthesis step, the first two authors met regularly to address disagreements.If a disagreement could not be resolved, one of the last two authors made the final decision.

Conclusion
This systematic literature review presents an extensive study on the applications of GANs for anomaly detection, covering 128 primary studies.We define and answer several RQs to capture the current best practices and available techniques to employ GANs for anomaly detection purposes.We also identify the existing challenges and provide six future research directions in this area.
The results reveal that GANs are used for GAN-assisted (data augmentation) and GAN-based (representation learning) anomaly detection.In both cases, the problem of insufficient amount of data for the anomalous behaviour of the system is addressed.In a GAN-assisted approach, the goal is to augment the minority class using the generative ability of GANs.In GAN-based anomaly detection, the goal is to use the representation learning ability of GANs, eliminating the need for minority class data.The most commonly used GAN architectures in the primary studies are DCGANs, standard GANs, and cGANs.GANs are applied for anomaly detection in a wide range of application domains.The primary areas in which GANs are used for anomaly detection are medicine, surveillance and intrusion detection.However, their application in many other domains, such as anomaly detection in sensor networks, smart grids, and cloud computing shows that GANs are a suitable solution for anomaly detection.
We identified six important directions for future research.Some of these directions are related to fundamental GAN research.For example, our study reveals that only 21% of the primary studies evaluated the quality of the data that was generated with GANs.Hence, an important direction for future research is to investigate how the performance of GANs can be evaluated, as better performing GANs will also result in better performing anomaly detection approaches.Another important future research direction is speeding up the training process of GANs.In addition, we identified several important future research directions for anomaly detection researchers.In particular, GAN-assisted anomaly detection approaches should improve their support for multimodal, discrete and noisy data, and account for changing behaviour of a system.Finally, researchers should investigate how recent improvements to GAN architectures can help improve their role in anomaly detection.

Figure 1 :
Figure 1: The building blocks of GANs.The classification error is used to update the parameters of the discriminator and generator models (shown by dashed lines).

Figure 4 :
Figure 4: Distribution of the primary studies according to the publication types and years.
Architecture of the Bi-directional GAN.

Figure 5 :
Figure 5: The architectures of the most commonly used generative adversarial networks: (a) the standard GAN, where G and D denote the generator and discriminator; note that the architecture of the Wasserstein GAN is the same; (b) conditional GAN; the only difference is the addition of auxiliary data, shown as c; (c) deep convolutional GAN (DCGAN), where Deconv denotes a deconvolutional layer, Conv a convolutional layer, and FC a fully connected layer; (d) BiGAN, with E denoting the encoder.
differentiable function represented by a multilayer perceptron with parameters θ g .For the discriminator model, another multilayer perceptron D(z; θ d ) is defined.Its single scalar output represents the probability that x comes from the data rather than p g .The training goal for the discriminator model D is to maximize the probability of assigning the correct label to both training examples and samples from the generator model G. Simultaneously, G is trained to minimize log(1 − D(G(z)).D and G are pitted against each other following a two-player minimax game.

4. 6
RQ6: Which anomaly detection techniques are used along with GANs?This section discusses the different types of anomaly detection techniques that either use GANs or are compared with GANs.Based on the labelled data availability, anomaly detection techniques are divided into three classes: supervised, semi-supervised, and unsupervised anomaly detection.During data synthesis for RQ6, we noticed that not all primary studies use consistent definitions for these classes.Therefore, we use Chandola et al.'s[153] definition of supervised,

Table 1 :
The data extraction form.

Table 2 :
List of studies using normal, abnormal, and normal and abnormal data together for different tasks of GANs.

Table 3 :
Representation learning with GANs

Table 4 :
Different types of GANs used for data augmentation.

Table 8 :
List of different data types.Type of input data Image Tabular Video Time series Text Frequency

Table 9 :
List of different preprocessing types with corresponding application to different data types.

Table 12 :
The performance metrics that are used to evaluate generated samples with correspondent application domains.AS: Autonomous Systems, CC: Climate Changes, FD: Fraud Detection, HI: Hyperspectral Images, IR: Image Recognition, ID: Intrusion Detection, MA: Manufacturing, ME: Medical, PE: Power/Energy, SS: Software Systems, SU: Surveillance,SH: System Health, TE: Text, TD: Trajectory Detection, VA: Various.

Table 15 :
Semi-supervised anomaly detection techniques compared to GANs.

Table 16 :
Unsupervised anomaly detection techniques based on GANs.GAN architecture List of references GAN architecture List of references

Table 17 :
Unsupervised anomaly detection techniques compared to GANs.