Skip to Main Content
Parallel file systems have been broadly deployed in large scale data centers, supporting a wide range of applications across a variety of industries. Unfortunately, most parallel file systems suffer from the intra-file fragmentation which is the disk performance killer. This paper presents the design and implementation of MiF, which introduces two techniques: on-demand preallocation and embedded directory, to Mitigate the intra-file Fragmentation of data placement, improving the disk performance in parallel file system. The key insight of on-demand preallocation is that the preallocation of a file should be aware of concurrent process streams and recognizes the write characteristic. The background rationale of embedded directory is that, since modern parallel file systems aggregate many normal operation pairs, exploring the disk bandwidth for metadata access requires all metadata of sub-flies in the same directory be placed adjacently. Measurements of our MiF implementation in a block-based parallel file system demonstrate that it can significantly improve I/O performance of parallel programs.