Alleviating I/O Inefficiencies to Enable Effective Model Training Over Voluminous, High-Dimensional Datasets | IEEE Conference Publication | IEEE Xplore

Alleviating I/O Inefficiencies to Enable Effective Model Training Over Voluminous, High-Dimensional Datasets


Abstract:

There has been an exponential growth in data volumes in several domains. Often these voluminous datasets encompass a large number of features. Fitting models to such high...Show More

Abstract:

There has been an exponential growth in data volumes in several domains. Often these voluminous datasets encompass a large number of features. Fitting models to such high-dimensional, voluminous data allows us to understand phenomena and inform decision-making. The analytics process is naturally iterative as scientists explore the set of features, data fitting algorithms, portions of the dataspace, and the particular algorithm's hyperparameters to guide their model-building process. It often takes several model-fitting attempts before one arrives at a satisfactory solution that may then be subjected to further refinements. Each of these model-building attempts is itself time-consuming and dominated by I/O and data movement costs. In this study, we present our methodology for significantly alleviating I/O-induced inefficiencies during model training. Rather than work with the raw data, we generate and work with sketches of the data. Our framework, Fennel, is independent of the libraries or analytical engines preferred by users. Our empirical benchmarks have been performed with datasets from diverse domains (weather, epidemiology, and music) and we profile several aspects of our methodology.
Date of Conference: 10-13 December 2018
Date Added to IEEE Xplore: 24 January 2019
ISBN Information:
Conference Location: Seattle, WA, USA

Contact IEEE to Subscribe

References

References is not available for this document.