Skip to Main Content
This paper examines how computation can be mapped across the nodes of a distributed search system to effectively utilize available resources. We specifically address computationally intensive search of complex data, such as content-based retrieval of digital images or sounds, where sophisticated algorithms must be evaluated on the objects of interest. Since these problems require significant computation, we distribute the search over a collection of compute nodes, such as active storage devices, intermediate processors and host computers. A key challenge with mapping the desired computation to the available resources is that the most efficient distribution depends on several factors: relative power and number of compute nodes; network bandwidth between the compute nodes; the cost of evaluating query predicates; and the selectivity of the given query. This wide range of variables renders manual partitioning of the computation infeasible, particularly since some of the parameters (e.g., available network bandwidth) can change during the course of a search. This paper proposes several techniques for dynamic partitioning of computation, and demonstrates that they can significantly improve efficiency for distributed search applications.