By Topic

Mining unstructured log files for recurrent fault diagnosis

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Reidemeister, T. ; E&CE Dept., Univ. of Waterloo, Waterloo, ON, Canada ; Miao Jiang ; Ward, P.A.S.

Enterprise software systems are large and complex with limited support for automated root-cause analysis. Avoiding system downtime and loss of revenue dictates a fast and efficient root-cause analysis process. Operator practice and academic research have shown that about 80% of failures in such systems have recurrent causes; therefore, significant efficiency gains can be achieved by automating their identification. In this paper, we present a novel approach to modelling features of log files. This model offers a compact representation of log data that can be efficiently extracted from large amounts of monitoring data. We also use decision-tree classifiers to learn and classify symptoms of recurrent faults. This representation enables automated fault matching and, in addition, enables human investigators to understand manifestations of failure easily. Our model does not require any access to application source code, a specification of log messages, or deep application knowledge. We evaluate our proposal using fault-injection experiments against other proposals in the field. First, we show that the features needed for symptom definition can be extracted more efficiently than does related work. Second, we show that these features enable an accurate classification of recurrent faults using only standard machine learning techniques. This enables us to identify accurately up to 78% of the faults in our evaluation data set.

Published in:

Integrated Network Management (IM), 2011 IFIP/IEEE International Symposium on

Date of Conference:

23-27 May 2011