Skip to Main Content
Owing to the fast development of network technologies, executing parallel programs on distributed systems that connect heterogeneous machines became feasible but we still face some challenges: Workload imbalance in such environment may not only be due to uneven load distribution among machines as in parallel systems but also due to distribution that is not adequate with the characteristics of each machine. In this paper, we present a new parallel join algorithm for heterogeneous distributed architectures based on an efficient dynamic data distribution and task allocation which makes it insensitive to data skew and ensures perfect balancing properties during all stages of join computation. The performance of this algorithm is analyzed using the scalable and portable BSP (bulk synchronous parallel) cost model. We show that our algorithm guarantees optimal complexity and near linear speed-up while reducing communication and disk input/output costs to a minimum.