Abstract:
Unsupervised anomaly detection of structured tabular data is a very important issue as it plays a key role in decision making in production practices. The mainstream unsu...Show MoreMetadata
Abstract:
Unsupervised anomaly detection of structured tabular data is a very important issue as it plays a key role in decision making in production practices. The mainstream unsupervised learning methods VAE (Variational Auto Encoder), GAN (Generative Adversarial Network) and other deep neural networks (DNNs) have achieved remarkable success in image, text and audio data recognition and processing, however, they are not suitable for tabular data, with over parameterisation and lack of proper inductive bias often lead to their inability to find optimal solutions for tabular decision manifolds. In this work, we propose to fuse a deep learning network architecture TabNet model with an isolated forest model to design a self-supervised learning algorithm, the Tab-iForest, to better apply deep neural networks to anomaly detection on tabular data. Firstly, we use the tabnet pre-training architecture to select the highest weighted features at each decision step using a sequential attention mechanism, which allows the learning power to be applied to the most salient features, resulting in interpretable and more efficient learning, which provides efficient and fine-grained representational information for model training for downstream unsupervised anomaly detection tasks---the isolated forest model. In this case, random down-sampling is used to construct sub-samples of small data volumes. Experimental results on credit card fraud datasets in the financial security domain show that the Tab-iForest anomaly detection algorithm achieves an accuracy and AUC of 99.895% and 0.8186 respectively, which is a significant improvement over the isolated forest anomaly detection algorithm alone, with a 45.12% and 38.947% improvement in recall and accuracy performance respectively, and a significant advantages in anomaly detection.
Date of Conference: 20-22 January 2022
Date Added to IEEE Xplore: 20 April 2022
ISBN Information: