Here we describe the application of two well known graph algorithms, Edmonds' algorithm and Prim's algorithm, to the problem of optimizing distributed SPARQL queries. In the context of this paper, a ldquodistributed SPARQL queryrdquo is a SPARQL query which is resolved by contacting any number of remote SPARQL endpoints. Two optimization approaches are described. In the first approach, a static query plan is computed in advance of query execution, using one of two standard graph algorithms for finding minimum spanning trees (Edmonds' algorithm and Prim's algorithm). In the second approach, the planning and execution of the query are interleaved, so that as each potential solution is expanded it is permitted to follow an independent query plan. Our optimization approach requires basic statistics regarding RDF predicates which must be obtained prior to the user's query, through automated querying of the remote SPARQL endpoints.
Published in:
Computational Science and Engineering, 2009. CSE '09. International Conference on
(Volume:1
)
Date of Conference: 29-31 Aug. 2009