By Topic

Monitoring High-Dimensional Data for Failure Detection and Localization in Large-Scale Computing Systems

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Haifeng Chen ; NEC Lab. America, Princeton ; Guofei Jiang ; Yoshihira, K.

It is a major challenge to process high-dimensional measurements for failure detection and localization in large-scale computing systems. However, it is observed that in information systems, those measurements are usually located in a low-dimensional structure that is embedded in the high-dimensional space. From this perspective, a novel approach is proposed to model the geometry of underlying data generation and detect anomalies based on that model. We consider both linear and nonlinear data generation models. Two statistics, that is, the Hotelling T2 and the squared prediction error (SPE), are used to reflect data variations within and outside the model. We track the probabilistic density of extracted statistics to monitor the system's health. After a failure has been detected, a localization process is also proposed to find the most suspicious attributes related to the failure. Experimental results on both synthetic data and a real e-commerce application demonstrate the effectiveness of our approach in detecting and localizing failures in computing systems.

Published in:

Knowledge and Data Engineering, IEEE Transactions on  (Volume:20 ,  Issue: 1 )