Skip to Main Content
DNA-binding transcription factors play an integral role in regulating gene expression. Transcription factor binding sites (TFBS) in the gene promoter regions can be predicted by using computational methods, such as Support Vector Machine (SVM), Hidden Markov Model (HMM), and Random Forest (RF), all of which summarize sequence patterns of experimentally determined TFBSs. Androgen receptor (AR), a ligand-dependent transcription factor, plays an important role in male reproductive functions by regulating gene transcription through directly binding to androgen response elements (ARE) in target gene promoters. The aim of this study is to use data mining tools to identify and characterize AREs based on sequence information. Three statistical methods were explored to strengthen the prediction of putative AREs in the human genome. Cross-validation results indicated that all of the three models provided good sensitivity and specificity in identifying AREs, with an accuracy of at least 80%. It is the first time that HMM, SVM and RF have all been applied to constructing ARE prediction models.