In this paper, we present our vision how statistical dependency rule mining could be applied to a thorough analysis of log data. Dependency rules are especially attractive as a first step mining method due to their efficient algorithms and globally optimal results. The major drawback is a rather specific form of the dependencies, which requires binary data. It is not always clear how heterogeneous real world data should be binarized and how the tools should be used so that all interesting dependencies would be caught. We give an overview of typical problems when analyzing log data. The three major problems are: 1) How to balance between groups and individuals such that both general regularities and individual peculiarities can be found? 2) How to handle numerical and periodic variables? 3) How to extract features from the intrinsic dimensions of log data? For each problem, we give practical solutions in the form of preprocessing techniques and constraints which can be used with the existing tools. We also point out important research problems and algorithmic challenges, which would require further research.
Published in:
Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on
Date of Conference: 10-10 Dec. 2012