Loading [a11y]/accessibility-menu.js
General-Purpose vs. Specialized Data Analytics Systems: A Game of ML & SQL Thrones | IEEE Conference Publication | IEEE Xplore

General-Purpose vs. Specialized Data Analytics Systems: A Game of ML & SQL Thrones


Abstract:

Over the past decade, a plethora of systems have emerged to support data analytics in various domains such as SQL and machine learning, among others. In each of the data ...Show More

Abstract:

Over the past decade, a plethora of systems have emerged to support data analytics in various domains such as SQL and machine learning, among others. In each of the data analysis domains, there are now many different specialized systems that leverage domain-specific optimizations to efficiently execute their workloads. An alternative approach is to build a general-purpose data analytics system that uses a common execution engine and programming model to support workloads in different domains. In this work, we choose representative systems of each class (Spark, TensorFlow, Presto and Hive) and benchmark their performance on a wide variety of machine learning and SQL workloads. We perform an extensive comparative analysis on the strengths and limitations of each system and highlight major areas for improvement for all systems. We believe that the major insights gained from this study will be useful for developers to improve the performance of these systems.
Date of Conference: 09-12 December 2019
Date Added to IEEE Xplore: 24 February 2020
ISBN Information:
Conference Location: Los Angeles, CA, USA

I. Introduction

As industry in nearly every sector of the economy has moved to a data-driven world [1], there has been an explosion in the volume of data that needs to processed and analyzed almost as soon as it is generated. Deriving value from data is typically a multi-stage process that involves data analysis workloads from various domains, such as SQL, machine learning (ML), and graph analytics, among others. The need for efficiently supporting such complex analytics has never been higher than the current level as gaining actionable insights from the data has become a key service differentiator.

Contact IEEE to Subscribe

References

References is not available for this document.