Skip to Main Content
There are many process mining algorithms and representations, making it difficult to choose which algorithm to use or compare results. Process mining is essentially a machine learning task, but little work has been done on systematically analyzing algorithms to understand their fundamental properties, such as how much data are needed for confidence in mining. We propose a framework for analyzing process mining algorithms. Processes are viewed as distributions over traces of activities and mining algorithms as learning these distributions. We use probabilistic automata as a unifying representation to which other representation languages can be converted. We present an analysis of the Alpha algorithm under this framework and experimental results, which show that from the substructures in a model and behavior of the algorithm, the amount of data needed for mining can be predicted. This allows efficient use of data and quantification of the confidence which can be placed in the results.