DataFed: Towards Reproducible Research via Federated Data Management | IEEE Conference Publication | IEEE Xplore

DataFed: Towards Reproducible Research via Federated Data Management


Abstract:

The increasingly collaborative, globalized nature of scientific research combined with the need to share data and the explosion in data volumes present an urgent need for...Show More

Abstract:

The increasingly collaborative, globalized nature of scientific research combined with the need to share data and the explosion in data volumes present an urgent need for a scientific data management system (SDMS). An SDMS presents a logical and holistic view of data that greatly simplifies and empowers data organization, curation, searching, sharing, dissemination, etc. We present DataFed - a lightweight, distributed SDMS that spans a federation of storage systems within a loosely-coupled network of scientific facilities. Unlike existing SDMS offerings, DataFed uses high-performance and scalable user management and data transfer technologies that simplify deployment, maintenance, and expansion of DataFed. DataFed provides web-based and command-line interfaces to manage data and integrate with complex scientific workflows. DataFed represents a step towards reproducible scientific research by enabling reliable staging of the correct data at the desired environment.
Date of Conference: 05-07 December 2019
Date Added to IEEE Xplore: 20 April 2020
ISBN Information:
Conference Location: Las Vegas, NV, USA
No metrics found for this document.

I. Introduction

Several scientific domains are experiencing an explosion in the volume, variety, veracity and velocity of data owing to increased automation, increased computational power, and faster, higher resolution sensors and detectors in scientific instruments [1], [2]. At the same time, research is becoming ever more globalized, collaborative, and multidisciplinary, and there is an increasing need to publish the supporting datasets behind research findings [3]. Furthermore, scientific discovery using data analytics techniques like machine learning (ML) and artificial intelligence (AI) requires large volumes of high quality and well organized data. Prior research has shown that as much as 50–80% of time is spent on data management and wrangling in most scientific research projects and this number is expected to rise [4], [5]. These factors are not only lowering scientific productivity but are also exacerbating the problem of poor reproducibility in science. The current state of the practice leads us to urgently seek a way to manage the lifecycle of data with an effective Scientific Data Management System (SDMS) [6], and use the SDMS as an essential component of the scientific process.

No metrics found for this document.
Contact IEEE to Subscribe

References

References is not available for this document.