Loading [a11y]/accessibility-menu.js
Coding based Distributed Data Shuffling for Low Communication Cost in Data Center Networks | IEEE Conference Publication | IEEE Xplore

Coding based Distributed Data Shuffling for Low Communication Cost in Data Center Networks


Abstract:

Data shuffling can improve the statistical performance of distributed machine learning. However, the obstruction of applying data shuffling is the high communication cost...Show More

Abstract:

Data shuffling can improve the statistical performance of distributed machine learning. However, the obstruction of applying data shuffling is the high communication cost. Existing works use coding technology to reduce communication cost. These works assume a master-worker based storage architecture. However, due to the demand for unlimited storage on the master, the master-worker storage architecture is not always practical in common data centers. In this paper, we propose a new coding method for data shuffling in the decentralized storage architecture, which is built on a fat-tree based data center network. The method determines which data samples should be encoded together and from which the encoded package should be sent to minimize the communication cost. We develop a real-world test-bed to evaluate our method. The results show that our method can reduce the transmission time by 6.4% over the state-of-art coding method, and by 27.8% over Unicasting.
Date of Conference: 17-19 December 2020
Date Added to IEEE Xplore: 07 April 2021
ISBN Information:
Conference Location: Tokyo, Japan

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.