Skip to Main Content
This paper consists of two parts. The first part provides an overview of knowledge discovery focusing on life sciences and describes the main motivations for developing and applying knowledge discovery methods to analyze complex biological data. The paper briefly describes a few case studies where the analysis of high throughput biological data using unsupervised or supervised machine learning techniques is demonstrated. These are cases in which real biological data sets (obtained from public or private sources) have been analyzed and studied for tasks such as gene function identification and gene response analysis. Several sources of public data sets will be covered, among which is GEO (gene expression omnibus) which is the most popular and well known source of today's biological data. The objective is to show the impacts of knowledge discovery in the entire bioinformatics pipeline. This consists of data pre-processing, data characteristics recognition, pattern recognition and validation of results. In the second part, the paper describes how discovered and validated knowledge could be structured into a knowledge base where it can be integrated with other forms of knowledge, for dissemination to multiple users and its expansion. Several topics might be related to challenges in knowledge management, as this is not a trivial task and it is rather a demanding paradigm.