Abstract:
Deep learning for image analytics is widely used in many real-world applications. Due to the rapid growth in data and model size there is a need to distribute the models ...Show MoreMetadata
Abstract:
Deep learning for image analytics is widely used in many real-world applications. Due to the rapid growth in data and model size there is a need to distribute the models in multiple nodes. Distributed computing of the model helps to increase the scalability, training time and its cost effectiveness. But the distribution can lead to longer computation times in case of stale nodes. The computational time of the distributed nodes are affected by many factors like latency caused dur to communication, network connectivity, resource sharing, computational power etc. The main problem faced in case of distribution is the staleness among the worker nodes. Effect of stragglers cannot be completely avoided in distributed clusters. The failures in storage, disks, imbalanced workloads, resources sharing etc. are the main cause of stragglers. Stragglers can cause longer computation time and reduce the performance of the model. The different methods used to address this issue is described in the paper in detail. The open research problems in this field are also highlighted.
Date of Conference: 25-27 March 2021
Date Added to IEEE Xplore: 12 April 2021
ISBN Information: