Skip to Main Content
In this paper, we consider a family of XOR-based erasure codes with finite-sized randomly-generated parity check matrices, and report the results of thorough computational search for suitable erasure codes for distributed storage applications. Although the discovered matrices are not "low density" and the resulting codes are only approximately maximum distance separable (MDS) codes, they have performance advantages over other codes, such as LDPC and IRA (irregular repeat-accumulate) codes, in terms of the overhead factor, that is, the average ratio of the total amount of encoded file blocks for restoring lost blocks to the amount of original file blocks. We designed our codes so that the overhead factor becomes small. While typical LDPC codes use matrices that have several thousand rows, our codes use matrices that have only one thousand rows in consideration of practicable operation time and overhead. Because a method for discovering the most suitable matrix from a large number of matrices has not been found, we executed Monte Carlo simulation for a long time in order to discover a suitable matrix with the lowest overhead factor. We have discovered a family of erasure codes with an overhead factor of 1.002 on average, compared to 1.07 for typical LDPC codes when the number of rows is 1000.