Loading [MathJax]/extensions/MathZoom.js
Detection of anomalies in the HDFS dataset | IEEE Conference Publication | IEEE Xplore

Detection of anomalies in the HDFS dataset


Abstract:

Big data systems are stable enough to store and process large volumes of quickly changing data. However, these systems are composed of massive hardware resources, which c...Show More

Abstract:

Big data systems are stable enough to store and process large volumes of quickly changing data. However, these systems are composed of massive hardware resources, which can easily cause their subcomponents to fail. Fault tolerance is a key attribute of such systems as they maintain availability, reliability and constant performance during failures. Implementing efficient fault-tolerant solutions in big data presents a challenge because fault tolerance has to satisfy some constraints related to system performance and resource consumption. To protect online computer systems from malicious attacks or malfunctions, log anomaly detection is crucial. This paper provides a new approach to identify anomalous log sequences in the HDFS (Hadoop Distributed File System) log dataset using three algorithms: Logbert, DeepLog and LOF. Then, it assess performance of all algorithms in terms of accuracy, recall, and F1-score.
Date of Conference: 23-25 May 2023
Date Added to IEEE Xplore: 03 August 2023
ISBN Information:

ISSN Information:

Conference Location: Orlando, FL, USA

I. Introduction

Many data sets are continually streaming into today’s computing and networking systems from weblogs, financial transactions, health records, surveillance logs, business, telecommunications and bio-sciences. Furthermore, logging has become a widely accepted and significant habit [1]. It is the process of recording occurrences on a computer system. The data is saved in what is known as a log file. This subject has lately become a focus of studies and is referred to as "big data", a phrase that indicates the massive and spread nature of the data collections. According to Gartner [2], big data is defined as high-volume, high-velocity and high-variety data sets that necessitate cost-effective innovative data analytics for decision-making and inferring relevant insights.

Contact IEEE to Subscribe

References

References is not available for this document.