Journals & Magazines >IEEE Access >Volume: 12

GSelf-MapReduce: A Method for Enhancing Mapreduce Performance in Distributed Heterogeneous Data Centers

In the physical environment used as a reference for the testbed of the proposed GSelf-MapReduce method, a structure consisting of a Data Center Gateway (DCG), Data Center...

Abstract:

Big data are often stored close to the locations where they are generated, owing to the cost of data transfer. These stored data are moved to a single location for proces...Show More

Metadata

Abstract:

Big data are often stored close to the locations where they are generated, owing to the cost of data transfer. These stored data are moved to a single location for processing or processed at that location. In the literature, it is possible to find different methods for processing data in distributed data centers. In this study, we present a new method for data processing called GSelf-MapReduce. In the proposed method, shuffling is performed among heterogeneous data center (DC) that complete the data-processing process. To calculate the data processing cost of the reduce function of the DCs, a polynomial regression model was created using the data obtained in the test environment, and the coefficients obtained from this model were used in the decision process. The key/value pairs to be shuffled are distributed according to the cost of the DCs, considering their location. In addition, not all DCs are waited to finish their job for shuffling. DCs that complete their job perform shuffling among themselves. Thus, the keys are deduplicated between these DCs. The shuffling volume in the last phase and the total job completion time are reduced. The performance of the proposed method was compared with that of four different distributed data processing methods in the literature. As a result, this work generates 15% less shuffled data than the closest work.

In the physical environment used as a reference for the testbed of the proposed GSelf-MapReduce method, a structure consisting of a Data Center Gateway (DCG), Data Center...

Published in: IEEE Access ( Volume: 12)

Page(s): 159503 - 159518

Date of Publication: 29 October 2024

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2024.3487936

Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.

IEEE Keywords
- Data processing ,
- Distributed databases ,
- Data centers ,
- Costs ,
- Big Data ,
- Sparks ,
- Data models ,
- Clustering algorithms ,
- Servers ,
- Mathematical models
Index Terms
- Data Center ,
- Heterogeneous Data Centers ,
- Data Processing ,
- Big Data ,
- Processing Methods ,
- Data Storage ,
- Data Methods ,
- Polynomial Regression ,
- Polynomial Regression Model ,
- Job Completion ,
- Total Completion Time ,
- Job Completion Time ,
- Volume Of Data ,
- Data Generation ,
- Clusters Of Groups ,
- Fault-tolerant ,
- Hierarchical Approach ,
- Cluster Of Cases ,
- Word Count ,
- Cluster Nodes ,
- Key Size ,
- Key Distribution ,
- Data Processing Time ,
- Input Weights ,
- Cluster A ,
- Cloud Providers ,
- Output Size ,
- Hardware Specifications
Author Keywords

Contents

Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.

IEEE Keywords
- Data processing ,
- Distributed databases ,
- Data centers ,
- Costs ,
- Big Data ,
- Sparks ,
- Data models ,
- Clustering algorithms ,
- Servers ,
- Mathematical models
Index Terms
- Data Center ,
- Heterogeneous Data Centers ,
- Data Processing ,
- Big Data ,
- Processing Methods ,
- Data Storage ,
- Data Methods ,
- Polynomial Regression ,
- Polynomial Regression Model ,
- Job Completion ,
- Total Completion Time ,
- Job Completion Time ,
- Volume Of Data ,
- Data Generation ,
- Clusters Of Groups ,
- Fault-tolerant ,
- Hierarchical Approach ,
- Cluster Of Cases ,
- Word Count ,
- Cluster Nodes ,
- Key Size ,
- Key Distribution ,
- Data Processing Time ,
- Input Weights ,
- Cluster A ,
- Cloud Providers ,
- Output Size ,
- Hardware Specifications
Author Keywords

References is not available for this document.

GSelf-MapReduce: A Method for Enhancing Mapreduce Performance in Distributed Heterogeneous Data Centers

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

GSelf-MapReduce: A Method for Enhancing Mapreduce Performance in Distributed Heterogeneous Data Centers

Alerts

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?