Conferences >2019 International Conference...

DataFed: Towards Reproducible Research via Federated Data Management

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

The increasingly collaborative, globalized nature of scientific research combined with the need to share data and the explosion in data volumes present an urgent need for...Show More

Metadata

Abstract:

The increasingly collaborative, globalized nature of scientific research combined with the need to share data and the explosion in data volumes present an urgent need for a scientific data management system (SDMS). An SDMS presents a logical and holistic view of data that greatly simplifies and empowers data organization, curation, searching, sharing, dissemination, etc. We present DataFed - a lightweight, distributed SDMS that spans a federation of storage systems within a loosely-coupled network of scientific facilities. Unlike existing SDMS offerings, DataFed uses high-performance and scalable user management and data transfer technologies that simplify deployment, maintenance, and expansion of DataFed. DataFed provides web-based and command-line interfaces to manage data and integrate with complex scientific workflows. DataFed represents a step towards reproducible scientific research by enabling reliable staging of the correct data at the desired environment.

Published in: 2019 International Conference on Computational Science and Computational Intelligence (CSCI)

Date of Conference: 05-07 December 2019

Date Added to IEEE Xplore: 20 April 2020

ISBN Information:

DOI: 10.1109/CSCI49370.2019.00245

Conference Location: Las Vegas, NV, USA

No metrics found for this document.

Contents

I. Introduction

Several scientific domains are experiencing an explosion in the volume, variety, veracity and velocity of data owing to increased automation, increased computational power, and faster, higher resolution sensors and detectors in scientific instruments [1], [2]. At the same time, research is becoming ever more globalized, collaborative, and multidisciplinary, and there is an increasing need to publish the supporting datasets behind research findings [3]. Furthermore, scientific discovery using data analytics techniques like machine learning (ML) and artificial intelligence (AI) requires large volumes of high quality and well organized data. Prior research has shown that as much as 50–80% of time is spent on data management and wrangling in most scientific research projects and this number is expected to rise [4], [5]. These factors are not only lowering scientific productivity but are also exacerbating the problem of poor reproducibility in science. The current state of the practice leads us to urgently seek a way to manage the lifecycle of data with an effective Scientific Data Management System (SDMS) [6], and use the SDMS as an essential component of the scientific process.

No metrics found for this document.

References is not available for this document.

DataFed: Towards Reproducible Research via Federated Data Management

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

DataFed: Towards Reproducible Research via Federated Data Management

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?