Abstract:
Using synthetic DNA for data storage and for physical information encoding in labeling, tracing, and authentication applications is becoming more feasible as synthesis an...Show MoreMetadata
Abstract:
Using synthetic DNA for data storage and for physical information encoding in labeling, tracing, and authentication applications is becoming more feasible as synthesis and reading technologies are improving. DNA in data storage applications has several advantages such as very high physical density and robustness. Some of the new synthesis technologies lead to repetition noise, consisting of sticky insertions and deletions in the resulting messages. In this paper, we address reconstruction algorithms for multiple trace communication channels with repetition (sticky insertion and deletion) noise. We prove correctness and analyze failure rates, both analytically and on simulated data. We identify a failure mechanism related to alternating stretches in the design sequence that leads to a potential bias in the data derived from reads (traces) and used for reconstruction. To minimize this effect we introduce alternating length limited codes (ALL codes) and analyze some of their properties.
Published in: IEEE Transactions on Communications ( Volume: 72, Issue: 2, February 2024)