Skip to Main Content
The log-structured disk subsystem is a new concept for the use of disk storage whose future application has enormous potential. In such a subsystem, all writes are organized into a log, each entry of which is placed into the next available free storage. A directory indicates the physical location of each logical object (e.g., each file block or track image) as known to the processor originating the I/O request. For those objects that have been written more than once, the directory retains the location of the most recent copy. Other work with log- structured disk subsystems has shown that they are capable of high write throughputs. However, the fragmentation of free storage due to the scattered locations of data that become out of date can become a problem in sustained operation. To control fragmentation, it is necessary to perform ongoing garbage collection, in which the location of stored data is shifted to release unused storage for re-use. This paper introduces a mathematical model of garbage collection, and shows how collection load relates to the utilization of storage and the amount of locality present in the pattern of updates. A realistic statistical model of updates, based upon trace data analysis, is applied. In addition, alternative policies are examined for determining which data areas to collect. The key conclusion of our analysis is that in environments with the scattered update patterns typical of database I/O, the utilization of storage must be controlled in order to achieve the high write throughput of which the subsystem is capable. In addition, the presence of data locality makes it important to take the past history of data into account in determining the next area of storage to be garbage-collected.
Note: The Institute of Electrical and Electronics Engineers, Incorporated is distributing this Article with permission of the International Business Machines Corporation (IBM) who is the exclusive owner. The recipient of this Article may not assign, sublicense, lease, rent or otherwise transfer, reproduce, prepare derivative works, publicly display or perform, or distribute the Article.