Skip to Main Content
Identifying the unknown transcription factor binding sites (TFBSs) is a fundamental and important component for understanding gene regulation as well as life mechanisms. The corresponding de novo motif discovery problem in bioinformatics is formulated as pattern discovery from strings, where challenges come from both modeling and optimization, because the short TFBSs are weak signals in massive and noisy experimental data. While genetic algorithms have been widely applied to the problem, recent memetic algorithms (MAs) employing local operators demonstrate the superiority in both effectiveness and efficiency. In this paper, we propose and study various MA components including local operators and models for motif discovery, through the newly established MA framework. The demonstrated optimization and modeling capabilities are analyzed in-depth on real datasets and their noisy versions. Selected optimal MAs show significantly improved performance over state-of-the-art methods in extensive tests including the blind test on the eukaryotic benchmark. This paper serves as the first systematic study of MAs on de novo motif discovery, where important issues are highlighted in the analyses of MA design. The comprehensive component categorization and the MA framework provide a useful platform for future MA developments, especially on the newly emerging chromatin immunoprecipitation followed by sequencing data.