Trace Reconstruction Problems in Computational Biology | IEEE Journals & Magazine | IEEE Xplore

Trace Reconstruction Problems in Computational Biology


Abstract:

The problem of reconstructing a string from its error-prone copies, the trace reconstruction problem, was introduced by Vladimir Levenshtein two decades ago. While there ...Show More

Abstract:

The problem of reconstructing a string from its error-prone copies, the trace reconstruction problem, was introduced by Vladimir Levenshtein two decades ago. While there has been considerable theoretical work on trace reconstruction, practical solutions have only recently started to emerge in the context of two rapidly developing research areas: immunogenomics and DNA data storage. In immunogenomics, traces correspond to mutated copies of genes, with mutations generated naturally by the adaptive immune system. In DNA data storage, traces correspond to noisy copies of DNA molecules that encode digital data, with errors being artifacts of the data retrieval process. In this paper, we introduce several new trace generation models and open questions relevant to trace reconstruction for immunogenomics and DNA data storage, survey theoretical results on trace reconstruction, and highlight their connections to computational biology. Throughout, we discuss the applicability and shortcomings of known solutions and suggest future research directions.
Published in: IEEE Transactions on Information Theory ( Volume: 67, Issue: 6, June 2021)
Page(s): 3295 - 3314
Date of Publication: 13 October 2020

ISSN Information:

PubMed ID: 34176957

Funding Agency:

Electrical and Computer Engineering Department, University of California San Diego, La Jolla, USA
Vinnu Bhardwaj received the B.E. degree from the PEC University of Technology, India, and the M.E. degree in ECE from the Indian Institute of Science (IISc). He is currently pursuing the Ph.D. degree with the Department of Electrical and Computer Engineering, University of California, San Diego (UCSD), with a specialization in data science and machine learning.
His research interests include the development of computationa...Show More
Vinnu Bhardwaj received the B.E. degree from the PEC University of Technology, India, and the M.E. degree in ECE from the Indian Institute of Science (IISc). He is currently pursuing the Ph.D. degree with the Department of Electrical and Computer Engineering, University of California, San Diego (UCSD), with a specialization in data science and machine learning.
His research interests include the development of computationa...View more
Computer Science and Engineering Department, University of California San Diego, La Jolla, USA
Pavel A. Pevzner received the Ph.D. degree from the Moscow Institute of Physics and Technology, Russia.
He was named the Howard Hughes Medical Institute Professor in 2006. He is currently the Ronald R. Taylor Professor of the Computer Science and Engineering and the Director of the NIH Center for Computational Mass Spectrometry, University of California, San Diego (UCSD). He has authored textbooks Computational Molecular B...Show More
Pavel A. Pevzner received the Ph.D. degree from the Moscow Institute of Physics and Technology, Russia.
He was named the Howard Hughes Medical Institute Professor in 2006. He is currently the Ronald R. Taylor Professor of the Computer Science and Engineering and the Director of the NIH Center for Computational Mass Spectrometry, University of California, San Diego (UCSD). He has authored textbooks Computational Molecular B...View more
Computer Science and Engineering Department, University of California San Diego, La Jolla, USA
Qualcomm Institute, University of California San Diego, La Jolla, USA
Cyrus Rashtchian received the B.S. degree in computer science from the University of Illinois, Urbana-Champaign, in 2010, and the Ph.D. degree in computer science and engineering from the University of Washington, Seattle, in 2018.
He is currently a Data Science Fellow at the University of California, San Diego (UCSD), affiliated with the Computer Science and Engineering Department and the Qualcomm Institute. His broad res...Show More
Cyrus Rashtchian received the B.S. degree in computer science from the University of Illinois, Urbana-Champaign, in 2010, and the Ph.D. degree in computer science and engineering from the University of Washington, Seattle, in 2018.
He is currently a Data Science Fellow at the University of California, San Diego (UCSD), affiliated with the Computer Science and Engineering Department and the Qualcomm Institute. His broad res...View more
Computer Science and Engineering Department, University of California San Diego, La Jolla, USA
Yana Safonova received the B.S. and M.S. degrees in computer science from Nizhny Novgorod State University, Russia, in 2012, and the Ph.D. degree in bioinformatics from Saint Petersburg State University, Russia, in 2017.
Since 2017, she has been a Postdoctoral Researcher with the Computer Science and Engineering Department, University of California, San Diego (UCSD). Since 2019, she has also been affiliated with the Depart...Show More
Yana Safonova received the B.S. and M.S. degrees in computer science from Nizhny Novgorod State University, Russia, in 2012, and the Ph.D. degree in bioinformatics from Saint Petersburg State University, Russia, in 2017.
Since 2017, she has been a Postdoctoral Researcher with the Computer Science and Engineering Department, University of California, San Diego (UCSD). Since 2019, she has also been affiliated with the Depart...View more

Electrical and Computer Engineering Department, University of California San Diego, La Jolla, USA
Vinnu Bhardwaj received the B.E. degree from the PEC University of Technology, India, and the M.E. degree in ECE from the Indian Institute of Science (IISc). He is currently pursuing the Ph.D. degree with the Department of Electrical and Computer Engineering, University of California, San Diego (UCSD), with a specialization in data science and machine learning.
His research interests include the development of computational methods to better understand biological mechanisms using data in different domains including immunogenomics and metabolomics. He is the author of MINING-D, the tool that lead to the discovery of 25 novel IGHD genes. He was awarded with the Dean’s Office Fellowship by UCSD (2015).
Vinnu Bhardwaj received the B.E. degree from the PEC University of Technology, India, and the M.E. degree in ECE from the Indian Institute of Science (IISc). He is currently pursuing the Ph.D. degree with the Department of Electrical and Computer Engineering, University of California, San Diego (UCSD), with a specialization in data science and machine learning.
His research interests include the development of computational methods to better understand biological mechanisms using data in different domains including immunogenomics and metabolomics. He is the author of MINING-D, the tool that lead to the discovery of 25 novel IGHD genes. He was awarded with the Dean’s Office Fellowship by UCSD (2015).View more
Computer Science and Engineering Department, University of California San Diego, La Jolla, USA
Pavel A. Pevzner received the Ph.D. degree from the Moscow Institute of Physics and Technology, Russia.
He was named the Howard Hughes Medical Institute Professor in 2006. He is currently the Ronald R. Taylor Professor of the Computer Science and Engineering and the Director of the NIH Center for Computational Mass Spectrometry, University of California, San Diego (UCSD). He has authored textbooks Computational Molecular Biology: An Algorithmic Approach, Introduction to Bioinformatics Algorithms (with Neal Jones), Bioinformatics Algorithms: an Active Learning Approach (with Phillip Compeau), and Learning Algorithms Through Programming and Puzzle Solving (with Alexander Kulikov). He has co-developed the Bioinformatics and Data Structure and Algorithms online specializations on Coursera as well as the Algorithms Micro Master Program at edX.
Dr. Pevzner was elected as the Association for Computing Machinery Fellow in 2010, the International Society for Computational Biology Fellow in 2012, the European Academy of Sciences Member (Academia Europaea) in 2016, and the American Association for Advancement in Science (AAAI) Fellow in 2018. He was awarded a Honoris Causa (2011) from the Simon Fraser University in Vancouver, the Senior Scientist Award (2017) by the International Society for Computational Biology, and the Kanellakis Theory and Practice Award from the Association for Computing Machinery (2019).
Pavel A. Pevzner received the Ph.D. degree from the Moscow Institute of Physics and Technology, Russia.
He was named the Howard Hughes Medical Institute Professor in 2006. He is currently the Ronald R. Taylor Professor of the Computer Science and Engineering and the Director of the NIH Center for Computational Mass Spectrometry, University of California, San Diego (UCSD). He has authored textbooks Computational Molecular Biology: An Algorithmic Approach, Introduction to Bioinformatics Algorithms (with Neal Jones), Bioinformatics Algorithms: an Active Learning Approach (with Phillip Compeau), and Learning Algorithms Through Programming and Puzzle Solving (with Alexander Kulikov). He has co-developed the Bioinformatics and Data Structure and Algorithms online specializations on Coursera as well as the Algorithms Micro Master Program at edX.
Dr. Pevzner was elected as the Association for Computing Machinery Fellow in 2010, the International Society for Computational Biology Fellow in 2012, the European Academy of Sciences Member (Academia Europaea) in 2016, and the American Association for Advancement in Science (AAAI) Fellow in 2018. He was awarded a Honoris Causa (2011) from the Simon Fraser University in Vancouver, the Senior Scientist Award (2017) by the International Society for Computational Biology, and the Kanellakis Theory and Practice Award from the Association for Computing Machinery (2019).View more
Computer Science and Engineering Department, University of California San Diego, La Jolla, USA
Qualcomm Institute, University of California San Diego, La Jolla, USA
Cyrus Rashtchian received the B.S. degree in computer science from the University of Illinois, Urbana-Champaign, in 2010, and the Ph.D. degree in computer science and engineering from the University of Washington, Seattle, in 2018.
He is currently a Data Science Fellow at the University of California, San Diego (UCSD), affiliated with the Computer Science and Engineering Department and the Qualcomm Institute. His broad research interests are motivated by building the foundations of data science, including DNA data storage, robust and explainable machine learning, computational and statistical trade-offs, distributed algorithms, and clustering. In general, he applies diverse geometric and algorithmic tools to problems in data science, with a keen eye for new applications and emerging technologies. Prior to UCSD, he has completed research internships at Facebook Reality Labs, Microsoft Research, and Cray. He has published in top machine learning and theoretical computer science conferences, including ITCS, SODA, COLT, ICML, NeurIPS, and AISTATS.
Cyrus Rashtchian received the B.S. degree in computer science from the University of Illinois, Urbana-Champaign, in 2010, and the Ph.D. degree in computer science and engineering from the University of Washington, Seattle, in 2018.
He is currently a Data Science Fellow at the University of California, San Diego (UCSD), affiliated with the Computer Science and Engineering Department and the Qualcomm Institute. His broad research interests are motivated by building the foundations of data science, including DNA data storage, robust and explainable machine learning, computational and statistical trade-offs, distributed algorithms, and clustering. In general, he applies diverse geometric and algorithmic tools to problems in data science, with a keen eye for new applications and emerging technologies. Prior to UCSD, he has completed research internships at Facebook Reality Labs, Microsoft Research, and Cray. He has published in top machine learning and theoretical computer science conferences, including ITCS, SODA, COLT, ICML, NeurIPS, and AISTATS.View more
Computer Science and Engineering Department, University of California San Diego, La Jolla, USA
Yana Safonova received the B.S. and M.S. degrees in computer science from Nizhny Novgorod State University, Russia, in 2012, and the Ph.D. degree in bioinformatics from Saint Petersburg State University, Russia, in 2017.
Since 2017, she has been a Postdoctoral Researcher with the Computer Science and Engineering Department, University of California, San Diego (UCSD). Since 2019, she has also been affiliated with the Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine. Her research interests cover open problems in computational immunology that include applications of the recently emerged immunosequencing technologies to design of antibody drugs, prediction of vaccine efficacy, and population analysis of the immune loci.
Dr. Safonova is a member of The Adaptive Immune Receptor Repertoire (AIRR) Community of The Antibody Society. She was awarded with the Data Science Postdoctoral Fellowship (2017) by UCSD and Intersect Fellowship for Computational Scientists and Immunologists (2019) by the American Associations of Immunologists.
Yana Safonova received the B.S. and M.S. degrees in computer science from Nizhny Novgorod State University, Russia, in 2012, and the Ph.D. degree in bioinformatics from Saint Petersburg State University, Russia, in 2017.
Since 2017, she has been a Postdoctoral Researcher with the Computer Science and Engineering Department, University of California, San Diego (UCSD). Since 2019, she has also been affiliated with the Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine. Her research interests cover open problems in computational immunology that include applications of the recently emerged immunosequencing technologies to design of antibody drugs, prediction of vaccine efficacy, and population analysis of the immune loci.
Dr. Safonova is a member of The Adaptive Immune Receptor Repertoire (AIRR) Community of The Antibody Society. She was awarded with the Data Science Postdoctoral Fellowship (2017) by UCSD and Intersect Fellowship for Computational Scientists and Immunologists (2019) by the American Associations of Immunologists.View more

Contact IEEE to Subscribe

References

References is not available for this document.