Towards Scalable Process Mining Pipelines | IEEE Conference Publication | IEEE Xplore

Towards Scalable Process Mining Pipelines


Abstract:

Over the past two decades, process mining has proven to be a valuable approach to gain insights into organizations' performance. The major sub-fields of discovery, confor...Show More

Abstract:

Over the past two decades, process mining has proven to be a valuable approach to gain insights into organizations' performance. The major sub-fields of discovery, conformance, and improvement have witnessed substantial development. Contributions have covered the spectrum of better algorithms, richer comparison metrics, and movement towards online analysis for process data. Mostly, these contributions were addressing process mining guidelines from the process mining manifesto. In this paper, we address the sixth guideline in the process mining manifesto. That is, process mining should be a continuous process. For this, we propose a pipelining approach that is: configurable, scalable, modular, and automated. We realize our proposal using Dask and evaluate it with different architectures, process discovery, and evaluation metrics.
Date of Conference: 14-17 November 2023
Date Added to IEEE Xplore: 25 December 2023
ISBN Information:

ISSN Information:

Conference Location: Abu Dhabi, United Arab Emirates

I. Introduction

Over the years, process mining [1] has proven its value as a family of data-driven process analytics techniques. In general, process mining can be subdivided into three main areas: process discovery, conformance checking, and process improvement. The main input to this family of analytics techniques is the so-called event log that contains the process execution data. In its simplest form, an event log is a sequence of events characterized by a case identifier, indicating the unique process instance, the label of the executed activity, and a timestamp (Table I). The sequence of events having the same case identifier is called a trace. A simple event log showing the case identifier, executed activity, and execution timestamp.

Case identifier Activity Timestamp
1 a 2022-08-01 15:00
1 b 2022-08-01 15:02
2 a 2022-08-01 15:03
2 b 2022-08-01 15:06
1 c 2022-08-01 15:06

Contact IEEE to Subscribe

References

References is not available for this document.