By Topic

Characterizing E-Science File Access Behavior via Latent Dirichlet Allocation

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Yusik Kim ; LRI, Univ. Paris-Sud 11, Orsay, France ; Germain-Renaud, C.

E-science is moving from grids to clouds. Getting the best of both worlds needs to build on the experience gained by the steady operation of production grids since some years. We propose a new approach for analyzing behavioral traces: as most of them are indeed text documents, state of the art techniques in text mining, and specifically latent Dirichlet allocation, can be exploited. The advantages are twofold: providing some level of explanation inferred from the data, and a relatively scalable way to capture the temporal variability of the behavior of interest, while retaining the full dimensionality of the problem at hand. We experiment the text mining analogy by characterizing file access behavior on data from the steady operation of the largest production grid. We validate the resulting probabilistic model by showing that it is capable of generating synthetic traces statistically consistent with the real ones. The approach would equally apply to wider contexts such as social networks activity or web access.

Published in:

Utility and Cloud Computing (UCC), 2011 Fourth IEEE International Conference on

Date of Conference:

5-8 Dec. 2011