Skip to Main Content
Replication in grid file systems can significantly improve I/O performance of data-intensive applications. However, most of existing replication techniques apply to individual files, which may introduce inefficient replication overheads for a large number of files. We propose a file clustering based replication algorithm for grid file systems. Our algorithm groups files according to a relationship of simultaneous accesses between files and stores replicas of the clustered files into storage nodes, to satisfy expected most of future read access times to the clustered files and replication times for individual files being minimized under the given storage capacity limitation. Our experiments on a given grid environment, 20 nodes of 5 sites, suggest that the proposed algorithm achieves accurate file clustering and efficient replica management; our clustering policy with the file cluster size limit of 5120 MB and the storage capacity limit for replicas of 10240 MB exhibits 1.58 times efficiency than the policy that never groups related files. The results also indicate that the overheads required for introducing our algorithm significantly affect I/O performance of running applications.