Genomic signal processing
Anastassiou, D.
Signal Processing Magazine, IEEE
Volume 18, Issue 4, Jul 2001 Page(s):8 - 20
Digital Object Identifier 10.1109/79.939833
Summary:Genomics is a highly cross-disciplinary field that creates
paradigm shifts in such diverse areas as medicine and agriculture. It is
believed that many significant scientific and technological endeavors in
the 21st century will be related to the processing and interpretation of
the vast information that is currently revealed from sequencing the
genomes of many living organisms, including humans. Genomic information
is digital in a very real sense; it is represented in the form of
sequences of which each element can be one out of a finite number of
entities. Such sequences, like DNA and proteins, have been
mathematically represented by character strings, in which each character
is a letter of an alphabet. In the case of DNA, the alphabet is size 4
and consists of the letters A, T, C and G; in the case of proteins, the
size of the corresponding alphabet is 20. As the list of references
shows, biomolecular sequence analysis has already been a major research
topic among computer scientists, physicists, and mathematicians. The
main reason that the field of signal processing does not yet have
significant impact in the field is because it deals with numerical
sequences rather than character strings. However, if we properly map a
character string into, one or more numerical sequences, then digital
signal processing (DSP) provides a set of novel and useful tools for
solving highly relevant problems. For example, in the form of local
texture, color spectrograms visually provide significant information
about biomolecular sequences which facilitates understanding of local
nature, structure, and function. Furthermore, both the magnitude and the
phase of properly defined Fourier transforms can be used to predict
important features like the location and certain properties of protein
coding regions in DNA. Even the process of mapping DNA into proteins and
the interdependence of the two kinds of sequences can be analyzed using
simulations based on digital filtering. These and other DSP-based
approaches result in alternative mathematical formulations and may
provide improved computational techniques for the solution of useful
problems in genomic information science and technology
View citation and abstract |