Loading [MathJax]/extensions/MathMenu.js
Distributed Task-Based Training of Tree Models | IEEE Conference Publication | IEEE Xplore

Distributed Task-Based Training of Tree Models


Abstract:

Decision trees and tree ensembles are popular supervised learning models on tabular data. Two recent research trends on tree models stand out: (1) bigger and deeper model...Show More

Abstract:

Decision trees and tree ensembles are popular supervised learning models on tabular data. Two recent research trends on tree models stand out: (1) bigger and deeper models with many trees, and (2) scalable distributed training frameworks. However, existing implementations on distributed systems are IO-bound leaving CPU cores underutilized. They also only find best node-splitting conditions approximately due to row-based data partitioning scheme. In this paper, we target the exact training of tree models by effectively utilizing the available CPU cores. The resulting system called TreeServer adopts a column-based data partitioning scheme to minimize communication, and a node-centric task-based engine to fully explore the CPU parallelism. Experiments show that TreeServer is up to 10× faster than models in Spark MLlib. We also showcase TreeServer's high training throughput by using it to build big “deep forest” models.
Date of Conference: 09-12 May 2022
Date Added to IEEE Xplore: 02 August 2022
ISBN Information:

ISSN Information:

Conference Location: Kuala Lumpur, Malaysia

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.