Skip to Main Content
Current state-of-the-art approaches for biological sequence querying and alignment require preprocessing and lack robustness to repetitions in the sequence. In addition, these approaches do not provide much support for efficiently querying subsequences, a process that is essential for tracking localized database matches. We propose a query-based alignment method for biological sequences that first maps sequences to time-domain waveforms before processing the waveforms for alignment in the time-frequency plane. The mapping uses waveforms, such as Gaussian functions, with unique sequence representations in the time-frequency plane. The proposed alignment method employs a robust querying algorithm that utilizes a time-frequency signal expansion whose basis function is matched to the basic waveform in the mapped sequences. The resulting WAVEQuery approach was demonstrated for both deoxyribonucleic acid (DNA) and protein sequences using the matching pursuit decomposition as the signal basis expansion. We specifically evaluated the alignment localization of WAVEQuery over repetitive database segments, and we demonstrated its operation in real-time without preprocessing. We also demonstrated that WAVEQuery significantly outperformed the biological sequence alignment method BLAST for queries with repetitive segments for DNA sequences. A generalized version of the WAVEQuery approach with the metaplectic transform is also described for protein sequence structure prediction.
Date of Publication: Sept. 2011