Abstract:
With the amount of data being generated every year increasing exponentially, figuring out where and how to store it efficiently and inexpensively is becoming a larger pro...Show MoreMetadata
Abstract:
With the amount of data being generated every year increasing exponentially, figuring out where and how to store it efficiently and inexpensively is becoming a larger problem every day. The rapid improvement in performance and cost of DNA synthesis and sequencing methods has led to an increased interest in the use of DNA as a durable and compact medium for data storage. Today, we have a large spectrum of available chemical tools that enable efficient data access and manipulation of in-DNA data. While several DNA storage architectures have been proposed, there is no open-source codec or simulator that implements all of the required components of the DNA-based data storage pipeline for research and development. We present an open-source end-to-end DNA data storage toolkit that can take an input file through the entire DNA storage pipeline. Our work contains implementations of the state-of-the-art techniques for each step of the pipeline, including our own algorithms for each step. These steps include encoding data into DNA strands, simulating the wetlab processes of synthesis, storage and sequencing of those DNA strands, clustering of the sequenced results, reconstruction of DNA strands from noisy clusters, and decoding the initially encoded file with support for error-correction mechanisms. Each module can be used individually or combined to form an entire pipeline. We hope that our toolkit will be useful to researchers and developers who seek to experiment with the new and promising storage technology.
Published in: 2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
Date of Conference: 05-07 May 2024
Date Added to IEEE Xplore: 16 July 2024
ISBN Information: