Skip to Main Content
In this talk, I will discuss some of the latest data mining techniques and methods and their applications in bioinformatics study, focusing on data integration, text mining and graph-based data mining in bioinformatics research. In data integration, I will present a semantic-based approach for multi source bioinformatics data integration. In our approach, a metamodel is utilized to represent the master search schema, and an effective interface extraction algorithm based on the hierarchical structure of the web and pattern is developed to capture the rich semantic relationships of the online bioinformatics data sources. Our final goal is to develop a meta-search interface for biologists as a single point of access to multiple online bioinformatics databases. In text mining, some of the challenging issues in mining and searching the biomedical literature are addressed, and I will present a unified architecture Bio-SET-DM (Biomedical Literature Searching, Extraction and Text Data Mining), discuss some novel algorithms such as semantic-based language model for literature retrieval, semi-supervised pattern learning for information extraction of biological relationships from biomedical literature. In the third part, graph-based data mining, the focus is on graph-based mining in biological networks. I will discuss how to apply graph-based mining techniques and algorithms in the analysis of modular and hierarchical structure of biological networks, how to identify and evaluate the subnetworks from complicated biological networks, and present the experimental results. To put these pieces together, a unified framework is introduced to integrate the three parts (data integration, text mining and graph-based data mining) in the bioinformatics data mining procedure.