A Genetic-Based EM Motif-Finding Algorithm for Biological Sequence Analysis
Chengpeng Bi
Schools of Medicine, Comput. & Eng., Missouri Univ., Kansas City, MO;
Abstract
Motif-finding in biological sequence analysis remains a challenge in computational biology. Many algorithms and software packages have been developed to address the problem. The expectation maximization (EM)-type motif algorithm such as MEME is one of the most popular de novo motif discovery methods. However, as pointed out in literature, EM algorithms largely depend on their initialization and can be easily trapped in local optima. This paper proposes and implements a genetic-based EM motif-finding algorithm (GEMFA) aiming to overcome the drawbacks inherent in EM motif discovery algorithms. It first initializes a population of multiple local alignments each of which is encoded on a chromosome that represents a potential solution. GEMFA then performs heuristic search in the whole alignment space using minimum distance length (MDL) as the fitness function which is generalized from maximum log-likelihood. The genetic algorithm gradually moves this population towards the best alignment from which the motif model is derived. Simulated and real biological sequence analysis showed that GEMFA performed better than the simple multiple-restart of EM motif-finding algorithm especially in the subtle motif sequence alignment and other similar algorithms as well
Index
Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.