Abstract:
Data shuffling can improve the statistical performance of distributed machine learning. However, the obstruction of applying data shuffling is the high communication cost...Show MoreMetadata
Abstract:
Data shuffling can improve the statistical performance of distributed machine learning. However, the obstruction of applying data shuffling is the high communication cost. Existing works use coding technology to reduce communication cost. These works assume a master-worker based storage architecture. However, due to the demand for unlimited storage on the master, the master-worker storage architecture is not always practical in common data centers. In this paper, we propose a new coding method for data shuffling in the decentralized storage architecture, which is built on a fat-tree based data center network. The method determines which data samples should be encoded together and from which the encoded package should be sent to minimize the communication cost. We develop a real-world test-bed to evaluate our method. The results show that our method can reduce the transmission time by 6.4% over the state-of-art coding method, and by 27.8% over Unicasting.
Date of Conference: 17-19 December 2020
Date Added to IEEE Xplore: 07 April 2021
ISBN Information: