By Topic

Analyze-NOW-an environment for collection and analysis of failures in a network of workstations

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Thakur, A. ; Center for Reliable & High Performance Comput., Illinois Univ., Urbana, IL, USA ; Iyer, R.K.

This paper describes Analyze-NOW, an environment for the collection and analysis of failures/errors in a network of workstations. Descriptions cover the data collection methodology and the tool implemented to facilitate this process. Software tools used for analysis are described, with emphasis on the details of the implementation of the Analyzer, the primary analysis tool. Application of the tools is demonstrated by using them to collect and analyze failure data (for 32-week period) from a network of 69 SunOS-based workstations. Classification based on the source and effect of faults is used to identify problem areas. Different types of failures encountered on the machines and network are highlighted to develop a proper understanding of failures in a network environment. The results from the analysis tool should be used to pinpoint the problem areas in the network. The results obtained from using Analyze-NOW on failure data from the monitored network reveal some interesting behavior of the network. Nearly 70% of the failures were network-related, whereas disk errors were few. Network-related failures were 75% of all hard-failures (failures that make a workstation unusable). Half of the network-related failures were due to servers not responding to clients, and half were performance-related and others. Problem areas in the network were found using this tool. The authors' approach was compared to the method of using the network architecture to locate problem areas. This comparison showed that locating problem areas using network architecture over-estimates the number of problem areas

Published in:

Reliability, IEEE Transactions on  (Volume:45 ,  Issue: 4 )