Loading [MathJax]/extensions/MathMenu.js
Programming Languages in Data Science: a Comparison from a Database Angle | IEEE Conference Publication | IEEE Xplore

Programming Languages in Data Science: a Comparison from a Database Angle


Abstract:

In a typical Data Science project, the analyst uses many programming languages to explore and analyze big data coming from diverse data sources. A major challenge is mana...Show More

Abstract:

In a typical Data Science project, the analyst uses many programming languages to explore and analyze big data coming from diverse data sources. A major challenge is managing and pre-processing so much data, with potentially inconsistent content, significant redundancy, in diverse formats, with varying data quality. Database systems research has tackled such problems for a long time, but mostly on relational databases. With such motivation in mind, this paper compares strengths and weaknesses of popular languages used nowadays from a database pespective: Python, R and SQL. We discuss the entire analytic pipeline, going from data integration, cleaning and pre-processing to model application and tuning. From a database systems perspective, we present a comprehensive survey of storage mechanisms, data processing algorithms, external algorithms, run-time memory management, consistency, optimizations and parallel processing. From a programming languages angle, we consider elegance, expressiveness, abstraction, composability, interactive behavior and automatic code optimization. We present a short experimental evaluation comparing the performance of the three languages on typical data exploration and pre-processing tasks. Our conclusion: there is no winner.
Date of Conference: 15-18 December 2021
Date Added to IEEE Xplore: 13 January 2022
ISBN Information:
Conference Location: Orlando, FL, USA

Contact IEEE to Subscribe

References

References is not available for this document.