Skip to Main Content
Data mining applications require an ability to understand unfiltered data embedded in event logs. The scalability of the data, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records will determine efficiency of data mining. Contemporary workflow management systems are driven by explicit process models based on completely specified workflow designs. Creating a workflow design is a complicated time-consuming process and typically there are discrepancies between the actual workflow processes and the processes as perceived by the management. In this paper, we propose a Process Mining Architecture (PROARCH) model which involves capturing processes in a system through event logs containing information about the different processes under execution. We assume that events in logs bear timestamps. But these logs will also contain log of unformatted data which may be dirty data for our model. Hence this information needs to be filtered before further processing. After filtering, the clean data is represented in MXML format and will serve as input to our model. This MXML data is parsed into a Petri net representation. The nodes and transitions, are connected to form a workflow representation. Since the initial input logs are dirty we use FP tree approach to build our workflow model.