Skip to Main Content
The identification of transcription factor binding sites in promoter sequences is an important problem, since it reveals information about the transcription regulation of genes. In this paper, a novel motif discovery method based on motif clustering and matching is proposed. Against a precompiled library of motifs which is represented by position weight matrices(PWMs), each L-mer in the dataset is matched to a motif base on the match scorepsilas P-value, then the PWMs are updated and clustered according to their similarity. Motif features are ranked in term of statistical significance (P-value). The advantage of this approach is that it can be used to simultaneously characterize every feature present in the dataset thus lessening the chance that weaker signals will be missed. We apply our method (implemented as a computer program called MotifCM) to the benchmark which has 56 datasets, and demonstrate that MotifCM achieves improved performance over several other popular motif discovery tools.