Applying feedback control to a replica management system
Wozniak, J.M.
Brenner, P.
Thain, D.
Striegel, A.
Izaguirre, J.A.
Dept. of Comput. Sci. & Eng., Notre Dame Univ.;
Abstract
Many modern storage systems used for large-scale scientific systems are multiple use, independently administrated clusters or grids. A common technique to gain storage reliability over a long period of time is the creation of data replicas on multiple servers, but in the presence of server failures, ongoing corrective action must be taken to prevent the loss of high value and low value data. Such a system is difficult to control, and replica management is typically handled in an ad hoc manner. In this work, we claim that repairing prioritized faults is a scheduling problem, founded on the need to minimize a risk-based error function, E. Citing experiments on a prototype replica system for molecular simulations, we apply concepts from control system theory to analyze and handle the application of corrective action
Index
Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.