Skip to Main Content
The Burrows-Wheeler transform permutes the symbols of a string such that the permuted string can be compressed effectively with fast, simple techniques. Inversion of the transform is a bottleneck in practice. Inversion takes linear time, but, for each symbol decoded, folklore says that a random access into the transformed string (and so a CPU cache-miss) is necessary. In this paper we show how to mitigate cache misses and so speed inversion. Our main idea is to modify the standard inversion algorithm to detect and record repeated sub strings in the original string as it is recovered. Subsequent occurrences of these repetitions are then copied in a cache friendly way from the already recovered portion of the string, short cutting a series of random accesses by the standard inversion algorithm. We show experimentally that this approach leads to faster runtimes in general, and can drastically reduce inversion time for highly repetitive data.
Date of Conference: 21-24 June 2011