Conferences >2023 IEEE/ACIS 21st Internati...

Detection of anomalies in the HDFS dataset

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Big data systems are stable enough to store and process large volumes of quickly changing data. However, these systems are composed of massive hardware resources, which c...Show More

Metadata

Abstract:

Big data systems are stable enough to store and process large volumes of quickly changing data. However, these systems are composed of massive hardware resources, which can easily cause their subcomponents to fail. Fault tolerance is a key attribute of such systems as they maintain availability, reliability and constant performance during failures. Implementing efficient fault-tolerant solutions in big data presents a challenge because fault tolerance has to satisfy some constraints related to system performance and resource consumption. To protect online computer systems from malicious attacks or malfunctions, log anomaly detection is crucial. This paper provides a new approach to identify anomalous log sequences in the HDFS (Hadoop Distributed File System) log dataset using three algorithms: Logbert, DeepLog and LOF. Then, it assess performance of all algorithms in terms of accuracy, recall, and F1-score.

Published in: 2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)

Date of Conference: 23-25 May 2023

Date Added to IEEE Xplore: 03 August 2023

ISBN Information:

ISSN Information:

DOI: 10.1109/SERA57763.2023.10197797

Conference Location: Orlando, FL, USA

Contents

I. Introduction

Many data sets are continually streaming into today’s computing and networking systems from weblogs, financial transactions, health records, surveillance logs, business, telecommunications and bio-sciences. Furthermore, logging has become a widely accepted and significant habit [1]. It is the process of recording occurrences on a computer system. The data is saved in what is known as a log file. This subject has lately become a focus of studies and is referred to as "big data", a phrase that indicates the massive and spread nature of the data collections. According to Gartner [2], big data is defined as high-volume, high-velocity and high-variety data sets that necessitate cost-effective innovative data analytics for decision-making and inferring relevant insights.

References is not available for this document.

Detection of anomalies in the HDFS dataset

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Detection of anomalies in the HDFS dataset

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?