I. Introduction
One of the critical (and practical) bottlenecks of the application of Machine Learning (ML) lies in the limited ability to collect, consistently label, and use large datasets. This is particularly the case for businesses that do not possess “unlimited resources”, such as Google or Amazon do [1]. Moreover, while existing data may be large and labeled, it may also be “split between stakeholders”, who do not want and/or cannot share their datasets [2], e.g. as in the case of medical data, which belongs to different hospitals/clinics. Moreover, ongoing controversies concern the collection and storage of information [3]. However, many ML developments, e.g. in mobile applications, rely on models being periodically (re/up)trained on sensitive private data (e.g., browsing history, or geo-positioning). Hosting such data in a centralized location, even in adherence to strict legislation, still poses serious security risks, as can be seen through repeated data leaks [4]–[6]. Note also that the latest advancements in ML involve training very large models and thus require enormous computational resources [7]. This not only increases the cost but also the carbon footprint [8].