Skip to Main Content
Information retrieval is concerned with the classification processes and the selective recovery of information. Improvements in this field are mainly sought at the core level of the engine's classification capabilities and by query enhancement processes. The later one became the prime interest of researchers since less progress has been made on the former one. Both make substantial use of manual interventions, which results in a less automated overall process. In this paper, we propose a new model based on the self-organizing map paradigm to discover the concepts embedded in a collection of documents. The terms of the corpus are directly classified into concepts, without manual category labelling. Then the concepts serve as a new knowledge representation for information retrieval. This model has been tested on a TREC-6 subcollection (text retrieval conference). As expected, the retrieval using the concepts representation does not outperform the corresponding full term retrieval. It is a step toward terms classification using a self-organizing map and contributes to fully automate the discovery of concepts in text collections.