Skip to Main Content
A three-tier data distribution framework is proposed for grid-enabled data analysis applications. This framework is based on existing resource reservation services: the analytical tasks which this framework serves, and their input data, are assigned by an existing performance-aware scheduling system to computational hosts termed 'calculators'. A so-called 'guider' organizes the data delivery from the source server to the 'coordinators', and every coordinator schedules the data propagation from itself to all its calculators. This scheme, which we call DIAO, is peer-to-peer (P2P), in that after downloading a data item, a host may behave as its server. Theoretical modeling reveals that the duration of data distribution depends on the effective utilization of the bottleneck resource in the overlay, and we develop three heuristics to minimize this duration, subject to the available resources. The ability of DIAO to exploit limited resources is demonstrated through simulation.