I. Introduction
Modern phones and tablets, wearable devices, and smart loT (Internet of Things) devices are generating massive amounts of data everyday, which is suitable for training machine learning models. However, the rich data is often privacy sensitive and large in quantity. Uploading such data to server and training there using traditional methods are neither affordable in privacy or cost. Due to the growing computational power of these devices, federated learning [1][2][3] offers a way to train a shared model directly on devices, by aggregating locally-computed updates. Potential examples include: learning to recognize activities of phone users, predicting health events, or learning personalized typing recommendation systems. Although the federated learning algorithm shows a reduction in required communication rounds as compared to synchronized stochastic gradient descent (SGD) methods, communication costs are still the principle constraint.