Scheduled System Maintenance:
On Monday, April 27th, IEEE Xplore will undergo scheduled maintenance from 1:00 PM - 3:00 PM ET (17:00 - 19:00 UTC). No interruption in service is anticipated.
By Topic

Automatic recognition of intermittent failures: an experimental study of field data

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Iyer, R.K. ; Coordinated Sci. Lab., Illinois Univ., Urbana, IL, USA ; Young, L.T. ; Iyer, P.V.K.

A methodology is proposed for recognizing the symptoms of persistent problems in large systems. The system error rate is used to identify the error states among which relationships may exist. Statistical techniques are used to validate and quantify the strength of the relationship among these error states. As input, the approach takes the raw error logs containing a single entry for each error that is detected as an isolated event. As output, it produces a list of symptoms that characterize persistent errors. Thus, given a failure, it is determined whether the failure is an intermittent manifestation of a common fault or whether it is an isolated (transient) incident. The technique is shown to work on two CYBER systems and on IBM 3081 multiprocessor system. Comparisons to real failure/repair information obtained from field engineers show that, in about 85% of the cases, the error symptoms recognized by this approach correspond to real problems. The remaining 15% of the cases, although not directly supported by field data, are confirmed as being valid problems

Published in:

Computers, IEEE Transactions on  (Volume:39 ,  Issue: 4 )