This paper proposes a practical way to evaluate the behavior of commercial-off-the-shelf (COTS) operating systems in the presence of faulty device drivers. The proposed method is based on the emulation of software faults in target device drivers and the observation of the behavior of the system and of a workload regarding a comprehensive set of failure modes analyzed according to different dimensions. The emulation of software faults itself is done through the injection at machine-code level of selected mutations that represent the code produced when typical programming errors are made in the high-level language code. An important aspect of the proposed methodology is the use of simple and established practices to evaluate operating systems failure modes, thus allowing its use as a dependability benchmarking technique. The generalization of the methodology to any software system built of discrete and identifiable components is also discussed.