Skip to Main Content
In this work we adapt an efficient information integration algorithm to identify the minimal set of potentially relevant Semantic Web data sources for a given query. The vast majority of these sources are files written in RDF or OWL format, and must be processed in their entirety. Our adaptation includes enhancing the algorithm with taxonomic reasoning, defining and using a mapping language for the purpose of aligning heterogeneous Semantic Web ontologies, and introducing a concept of source relevance to reduce the number of sources that we need to consider for a given query. After the source selection process, we load the selected sources into a Semantic Web reasoner to get a sound and complete answer to the query. We have conducted an experiment using synthetic ontologies and data sources which demonstrates that our system performs well over a wide range of queries. A typical response time for a substantial work load of 50 domain ontologies, 80 map ontologies and 500 data sources is less than 2 seconds. Furthermore,our system returned correct answers to 200 randomly generated queries in several workload configurations. We have also compared our adaptation with a basic implementation of the original information integration algorithm that does not do any taxonomic reasoning. In the most complex configuration with 50 domain ontologies, 100 map ontologies and 1000 data sources our system returns complete answers to all the queries whereas the basic implementation returns complete answers to only 28% of the queries.