Skip to Main Content
Data streaming management and scheduling is required by many grid computing applications, especially when the volume of data to be processed is extremely high while available storage is relatively limited. Big bulk of data from scientific experiments is usually partitioned into lots of small files (LOSF), bringing challenges to data streaming supports. Block-based data transferring is proposed in this work and implemented using GridFTP, where the number of blocks or the size of each block must be carefully scheduled, taking makespan and available storage into account simultaneously. To increase processing efficiency, data streaming and processing have to be performed concurrently; data streaming scheduling must be storage-aware to avoid data overflow. Experimental results show that the optimization method for block-based concurrent and storage-aware data streaming proposed in this work is efficient to deal with the LOSF problem with a relatively good performance in terms of makespan and storage usage.