Skip to Main Content
The recent data deluge needing to be processed represents one of the major challenges in computational field. Available high-performance computing (HPC) systems can be very useful for solving this problem when data can be divided in chunks that can be processed in parallel. However, due to intrinsic characteristics of data-intensive problems, these applications can present huge load imbalances, and it can be difficult to efficiently use the available resources. This work proposes a strategy for dynamically analyzing and tuning the partition factor used to generate the data chunks. With the aim of decreasing the load imbalance and therefore the overall execution time, this strategy divides the data chunks with the biggest computation times and gathers contiguous chunks with the smallest computation times. The criteria to divide or join chunks are based on the chunks' associated execution time (average and standard deviation) and the number of processing nodes being used. We have evaluated our strategy by using simulation, and a real data-intensive application. Applying our strategy, we have obtained promising results since we have improved up to 55% the total execution time.