Skip to Main Content
Searching for relevant documents in large sets of documents is one of the key tasks in the areas of semantic web and knowledge technologies. This paper deals with analysis and design of improvement for information retrieval (IR) using specific conceptual model automatically created from semantically non-annotated set of text documents. This conceptual model combines locally applied Formal Concept Analysis (FCA) and agglomerative clustering of particular models into one structure, which is suitable to support information retrieval process and can be combined with standard full-text search. Formal Concept Analysis (FCA) is one of the approaches which can be applied in process of conceptual modeling in domain of text documents. Extension of classic FCA (binary table data) is one-sided fuzzy version that works with real values in the object-attribute table (document-term matrix in case of vector representation of text documents). In our approach, starting set of documents is decomposed to smaller sets of similar documents with the use of some partitional clustering algorithm. Then one concept lattice is built for every cluster using FCA method and these FCA-based models are combined to hierarchy of concept lattices using agglomerative clustering algorithm. Finally, we define basic details and methods of IR system that combines standard full-text search and conceptual search (using extracted concept hierarchy).