Modern information systems such as the World Wide Web and digital libraries contain more data than ever before, are globally distributed, are easy to use, and therefore become accessible to huge, heterogeneous user groups. On the other hand, the potentially enormously large amount of heterogeneous information requires powerful tools to allow the user to find relevant pieces of data. One such tool is thesauri. They are a proven means to provide a uniform and consistent vocabulary for the indexing and retrieval of information-bearing objects (IBOs). Modern multi-lingual and multi-subject information systems require more than the traditional single-language, narrow-focus thesauri. The broad clientele of information systems demands thesauri that can be used by non-specialists. To achieve this goal, we introduce the framework of thesaurus federations, i.e. loose compounds of distributed, multi- or mono-lingual thesauri that go beyond the already-known concepts of multi-thesaurus systems. We classify multi-thesaurus systems into multi-thesaurus environments, thesaurus switching systems and thesaurus compounds. Our architecture is based on a mediation layer and wrappers for the integration of heterogeneous, distributed thesauri. We present a Java-based prototype system which enables integrated access to several thesauri, which is available through a SQL or HTML interface via a comfortable thesaurus federation browser. This system has been used for the retrieval of metadata records managed by the Catalogue of Data Sources of the European Environment Agency (EEA)
Published in:
Research and Technology Advances in Digital Libraries, 1998. ADL 98. Proceedings. IEEE International Forum on
Date of Conference: 22-24 Apr 1998