In this paper we extend a previously published approach to error recovery in enterprise storage controllers with multi-core processors. Our approach first involves the partitioning of the set of tasks in the runtime of the controller software into clusters (recovery scopes) of dependent tasks. Then, these recovery scopes are mapped into a set of recovery groups, on which the scheduling of tasks, both during the recovery process and normal operation, is based. This recovery-aware scheduling (RAS) replaces the performance-based scheduling of the storage controller. Through simulation and benchmark experiments, we find that: 1) the performance of RAS appears to be critically dependent on the values of recovery-related parameters; and 2) our fine-grained recovery approach promises to enhance the storage system availability while keeping the additional overhead, and the resulting degradation in performance, under control.
Note: The Institute of Electrical and Electronics Engineers, Incorporated is distributing this Article with permission of the International Business Machines Corporation (IBM) who is the exclusive owner. The recipient of this Article may not assign, sublicense, lease, rent or otherwise transfer, reproduce, prepare derivative works, publicly display or perform, or distribute the Article.