Skip to Main Content
In most large computer installations files are moved between on-line disk and mass storage (tape, integrated mass storage device) either automatically by the system and/or at the direction of the user. In this paper we present and analyze long term file reference data in order to develop a basis for the construction of algorithms for file migration. Specifically, we examine the use of the on-line user (primarily text editor) data sets at the Stanford Linear Accelerator Center (SLAC) computer installation through the analysis of 13 months of file reference data. We find that most files are used very few times. Of those that are used sufficiently frequently that their reference patterns may be examined, we find that: 1) about a third show declining rates of reference during their lifetime, 2) of the remainder, very few (about 5 percent) show correlated interreference intervals, and 3) interreference intervals (in days) appear to be more skewed than would occur with the Bernoulli process. Thus, about two-thirds of all suffi1ciently active files appear to be referenced as a renewal process with a skewed interreference distribution. A large number of other file reference statistics (file lifetimes, interference distributions, moments, means, number of uses/ file, file sizes, file rates of reference, etc.) are computed and presented. Throughout, statistical tests are described and explained. The results of our analysis of file reference patterns are applied in a companion paper to the development and comparative evaluation of file migration algorithms.