A schematic diagram of efficient Federated Learning for Gradient Boosting Decision Trees (eFL-Boost) which comprises Builder, Data Owners, and Aggregator. Builder is sele...
Abstract:
Privacy protection has attracted increasing attention, and privacy concerns often prevent flexible data utilization. In most industries, data are distributed across multi...Show MoreMetadata
Abstract:
Privacy protection has attracted increasing attention, and privacy concerns often prevent flexible data utilization. In most industries, data are distributed across multiple organizations due to privacy concerns. Federated learning (FL), which enables cross-organizational machine learning by communicating statistical information, is a state-of-the-art technology that is used to solve this problem. However, for gradient boosting decision tree (GBDT) in FL, balancing communication efficiency and security while maintaining sufficient accuracy remains an unresolved problem. In this paper, we propose an FL scheme for GBDT, i.e., efficient FL for GBDT (eFL-Boost), which minimizes accuracy loss, communication costs, and information leakage. The proposed scheme focuses on appropriate allocation of local computation (performed individually by each organization) and global computation (performed cooperatively by all organizations) when updating a model. It is known that tree structures incur high communication costs for global computation, whereas leaf weights do not require such costs and are expected to contribute relatively more to accuracy. Thus, in the proposed eFL-Boost, a tree structure is determined locally at one of the organizations, and leaf weights are calculated globally by aggregating the local gradients of all organizations. Specifically, eFL-Boost requires only three communications per update, and only statistical information that has low privacy risk is leaked to other organizations. Through performance evaluation on public data sets (ROC AUC, Log loss, and F1-score are used as metrics), the proposed eFL-Boost outperforms existing schemes that incur low communication costs and was comparable to a scheme that offers no privacy protection.
A schematic diagram of efficient Federated Learning for Gradient Boosting Decision Trees (eFL-Boost) which comprises Builder, Data Owners, and Aggregator. Builder is sele...
Published in: IEEE Access ( Volume: 10)
Funding Agency:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Costs ,
- Organizations ,
- Decision trees ,
- Data models ,
- Boosting ,
- Privacy ,
- Histograms
- Index Terms
- Decision Tree ,
- Gradient Boosting Decision Tree ,
- Federated Learning ,
- Efficient Federated Learning ,
- Public Datasets ,
- Statistical Information ,
- Tree Structure ,
- Privacy Protection ,
- Information Leakage ,
- Accuracy Loss ,
- Multiple Organisms ,
- Communication Cost ,
- Local Computing ,
- Leaf Weight ,
- Privacy Risks ,
- Log Loss ,
- Ethnic Minority ,
- Predictive Performance ,
- Computational Complexity ,
- Machine Learning Models ,
- Data Owner ,
- Global Model ,
- Differential Privacy ,
- Global Distribution ,
- High Predictive Performance ,
- Target Variable ,
- Minimum Amount Of Data ,
- Federated Learning Model ,
- LightGBM ,
- Encryption
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Costs ,
- Organizations ,
- Decision trees ,
- Data models ,
- Boosting ,
- Privacy ,
- Histograms
- Index Terms
- Decision Tree ,
- Gradient Boosting Decision Tree ,
- Federated Learning ,
- Efficient Federated Learning ,
- Public Datasets ,
- Statistical Information ,
- Tree Structure ,
- Privacy Protection ,
- Information Leakage ,
- Accuracy Loss ,
- Multiple Organisms ,
- Communication Cost ,
- Local Computing ,
- Leaf Weight ,
- Privacy Risks ,
- Log Loss ,
- Ethnic Minority ,
- Predictive Performance ,
- Computational Complexity ,
- Machine Learning Models ,
- Data Owner ,
- Global Model ,
- Differential Privacy ,
- Global Distribution ,
- High Predictive Performance ,
- Target Variable ,
- Minimum Amount Of Data ,
- Federated Learning Model ,
- LightGBM ,
- Encryption
- Author Keywords