Discovery of Object-Centric Behavioral Constraint Models With Noise

Process discovery techniques can automatically discover process models from event data. These models reveal the actual behavior of the related organization and have successfully been applied in a range of domains. Event data need to be extracted from information systems. Today, most organizations use object-centric systems such as ERP and CRM systems, which generate and store data in an object-centric manner. Unfortunately, existing discovery techniques are more focused on a behavioral perspective of processes, where the data perspective is often considered as a second-class citizen. Moreover, these discovery techniques fail to deal with object-centric data with many-to-many relationships. Event data need to be “flattened” focusing on a single object type (i.e., case notion). Therefore, in this paper, we aim to discover a novel model which combines data and behavior perspectives, and the resulting Object-Centric Behavioral Constraint (OCBC) model is able to describe processes involving interacting instances and complex data dependencies. Besides, we provide solutions to deal with the noise problem, which enables process discovery in real life data.


I. INTRODUCTION
Information systems are widely used by enterprises and organizations to support their daily transactions. Object-centric systems such as ERP and CRM, are the most widely used type of systems used by today's organizations. These systems generate and store data in an object-centric manner, i.e., transactions update a database (often relational) storing information about objects (people, orders, etc.). Such systems integrate and optimize the business processes and transactions over different departments using shared databases maintained by a database management system [1], [3], [23].
Process discovery techniques can automatically discover process models from event data generated by information systems. These models reveal the actual behavior of the related organization and have successfully been applied in a range of domains. The existing state of the art techniques such as the Inductive Miner [16], ILP Miner [34], Heuristic Miner [36], Declare Miner [21] and Split Miner [2] assume that process The associate editor coordinating the review of this manuscript and approving it for publication was Porfirio Tramontana .
instances are recorded as cases composed of ordered events. In this classical setting each event corresponds to one case identifier [30].
However, most of these discovery techniques fail to deal with data-centric/object-centric processes supported by CRM and ERP systems, which have one-to-many and many-tomany relationships between data objects. The immediate challenge for existing techniques is that these processes have no clear case notion to relate events. Besides, the discovered models are often based on business process modeling languages such as Petri nets, BPMN diagrams, Workflow nets, EPCs, and UML activity diagrams, which typically consider process instances in isolation, ignoring interactions among them. Moreover, they cannot model the data perspective in a precise manner. In other words, the data perspective can be described, but the more powerful constructs (e.g., cardinality constraints) used in Entity-Relationship (ER) models [4], Object-Role Models (ORM) [11] and UML class models [10] are not employed at all. As a result, data and control-flow need to be described in separate diagrams. Researchers have proposed different ways to solve the problems mentioned above. The early solutions extend process models with a data perspective, while remaining process-centric. For instance, [8], [9], [13], [14], [37] append values to tokens in Petri nets in order to add data to process models, resulting in colored Petri nets. Several variants of colored Petri nets tailored towards process mining and BPM have been proposed. Data-aware process mining discovery techniques [7], [28] extend the control-flow perspective with the data perspective (e.g., read and write operations, decision points and transition guards) using standard data mining techniques. These techniques can express the data perspective to some extent, but are still not powerful enough. Besides, they are based on logs with case notions. The latter solutions are artifact-centric approaches proposed in [5], [12], [18], [24], and [31], which attempt to describe business processes in terms of so-called business artifacts. Artifacts have data and lifecycles attached to them, thus relating both perspectives. References [19], [25], and [27] demonstrate several approaches to discover artifact-centric models from datacentric processes. However, they force users to specify artifacts as well as a single instance notion within each artifact, and tend to result in complex specifications over multiple diagrams.
This paper aims to discover models based on a novel modeling language, named the Object-Centric Behavioral Constraint (OCBC) language [32], that combines data/object modeling techniques (ER, UML, or ORM) and a declarative process modeling language (inspired by Declare [33]). More precisely, a discovered OCBC model (cf. Figure 10) consists of a class model (presenting cardinality constraints between objects), a behavioral model (presenting declarative constraints between events) and so-called AOC relationships which connect these two models by relating activities in the behavioral model to object classes in the class model. Besides, we provide solutions to simplify OCBC models by filtering infrequent elements and discover precise behavioral constraints, which can better discover models in real life data with noise than the approach in [17].
The remainder is organized as follows. Section II proposes a novel log format to organize object-centric data displayed above, which is taken as input for model discovery techniques. Our discovery algorithm is proposed in Section III while illustrating the ingredients of OCBC models. Section IV provides advanced discovery techniques to deal with noise. We implement the discovery approach in a ProM plugin, which is described in Section V. The plugin is interactive and allows the user to customize the process model. Section VI compares OCBC models with other common models and Section VII concludes the paper.

II. OBJECT-CENTRIC EVENT LOGS
The input for process discovery techniques is an event log, which contains the event data generated by information systems. The XES log format which stands for eXtensible Event Stream (www.xes-standard.org) is the de facto exchange format for discovery techniques [35]. In general, an XES log consists of a collection of traces. A trace describes the life-cycle of a particular case (i.e., a process instance) in terms of the activities executed. When the data have many-to-many relationships or multiple case notions, we need to flatten the data to fit the single case paradigm required by the XES log format. The flattened event log is like a particular view on the whole data set and it ruins the completeness of the original data set.
As object-centric data have many-to-many relationships, existing event log formats fail to organize such data (without ''flattening'' the data). In order to keep the completeness, we propose a novel object-centric event log format to organize and store object-centric data.
A. OBJECTS AND CLASSES Figure 1 displays a motivating example which is used to illustrate our approach throughout the paper. The example is a fragment of object-centric data, generated in the Order-To-Cash (OTC) business process scenario of a real ERP system named Dolibarr. 1 In general, the data consists of records in 9 tables, which correspond to transactions on Dolibarr. For instance, the first row o1 in the ''order'' table indicates an order was created at ''2017-08-11 10:33:37''. The directed edges between tables indicate the dependency relations. For instance, r5 denotes a dependency relation from the ''customer'' column (foreign key) in the ''order'' table to the ''id'' column (primary key) in the ''customer'' table. Based on this relation, we know that the order o1 is for a customer c1 named ''Ming''. Note that, there exist one-to-many relations, e.g., one order o1 corresponds to two order lines ol1 and ol2 in the ''order_line'' table. Besides, there also exist many-to-many relations illustrated by the ''element relation'' table, in which, one order o1 corresponds to two invoices i1 and i2 while one invoice i2 corresponds two orders o1 and o2.
The data shown in Figure 1 are object-centric, which are different from process-centric data such as XES logs. The object-centric manner is the way data are organized in relational databases (SAP, Oracle, etc.). The term ''object'' in this paper has a different meaning from other fields, such as software engineering. Objects are data elements generated by object-centric information systems and stored in a relational database. These are grouped in classes and have some attributes. Besides, there exist object relations between objects to indicate the interrelation. An object relation is of some class relationship, which indicates the type of interrelation.
For example in Figure 1, the record o1 in the ''order'' table can be considered as an object of class ''order''. Each value (e.g., ''c1'' in the ''customer'' column) in the record can be considered as an attribute of the object. Class relationships correspond to dependency relationships between tables, 1 Dolibarr ERP/CRM is an open source (webpage-based) software package for small and medium companies (www.dolibarr.org). It supports sales, orders, procurement, shipping, payments, contracts, project management, etc. which are directed (denoted as arrows) from child tables to parent tables. For instance, the class relationship r5 indicates the foreign key ''customer'' in the ''order'' table refers to the primary key ''id'' in the ''customer'' table. An object relation example of r5 is the relation between c1 and o1, which indicates the customer of o1 is c1.
Objects can represent records in tables while object relations can represent the reference relations between records. Based on this idea, we define an object model to incorporate objects and object relations to represent the database.
An object model consists of objects (with attributes) and corresponding object relations. It is a snapshot of the database at some moment. However, the application of object models is not limited to the field of databases. The basic idea of object models is to represent a state of an object-centric system at some moment.

B. EVENT LOGS
In general, event logs are used to record past events related to information systems, machines, organizations, etc. In this paper, the idea of event logs is to record operations on object-centric systems and the corresponding state of the system after each operation. An event log consists of a list of events which represents operations, and each event corresponds to an object model which represents the state of the process just after the execution of the event.
In object-centric data, events are recorded implicitly. For example, they could be the instances of timestamp columns. Each event corresponds to an activity and may have attributes such as the time at which the event took place, and the resource executing the corresponding event. Moreover, events are atomic and ordered. For simplicity, we assume a total order. To model the overlapping of activities in time one can use start and complete events.
Take Figure 1 as an example to explain events and activities. The ''creation date'' column in the ''order'' table can be considered as the ''create order'' activity while each instance of the column is a ''create order'' event. The additional attributes of events could be some other related columns in the same table while the total order could be achieved by timestamp or logic knowledge. Definition 2 formalizes the notion of an event log.
Definition 2 (eXtensible Object-Centric Event Log): Let U E be the universe of events and U A be the universe of activities. An eXtensible Object-Centric (XOC) event log is a tuple L = (E, act, attrE, EO, om, ), where: function assigning values to some attributes, • EO ⊆ E × U O relates events and object references, • om ∈ E → U OM maps each event to the object model directly after the event took place, and • ⊆ E × E defines a total order on events. 3 U L is the universe of XOC event logs.
In the context of an XOC event log L, each event e is associated with object model OM e = (Obj e , Rel e , class e , attrO e ) = om(e). In this paper, we refer directly to Obj e , Rel e , class e , attrO e for e ∈ E if the context is clear. Moreover, we assume objects cannot change classes in the whole log. Definition 3 defines notations that refer to events following some temporal restriction specified by . 3 A total order is a binary relation that is (1) antisymmetric, i.e. e 1 e 2 and e 2 e 1 implies e 1 = e 2 , (2) transitive, i.e. e 1 e 2 and e 2 e 3 implies e 1 e 3 , and (3) total, i.e., e 1 e 2 or e 2 e 1 .

Definition 3 (Event Notations):
Let E ⊆ U E be a set of events ordered by and related to activities through function act. For any event e ∈ E: • e (E) = {e ∈ E | e e} are the events before and including e, An XOC event log is a collection of events that belong together, i.e., they belong to some ''process'' where many types of objects/instances may interact. Each event corresponds to an object model and the event refers to objects in the object model. Note that one event may refer to multiple objects and one object may be referred to by multiple events. The objects referred to by an event indicate that they are impacted by the operation corresponding to the event. Such an event log is object-centric since the events are related through the data perspective (cf. Section III-C1). Table 1 presents an XOC log example which originates from the nine tables in the motivating example ( Figure 1). More precisely, the numbers in the ''Index'' column indicate the order of events (i.e., ). In order to refer to a specific event, the ''Event'' column assigns an ID to each event, e.g., co1 refers to the first create order event. Each event has a corresponding activity indicated by the ''Activity'' column. For instance, the activity of co1 is co (denoting create order). Besides, each event may have additional attributes. For example, the last event cp2 has two attributes, ''create time = 2017-08-16'' (we only show the date for simplicity), and ''amount = 1904''. In order to relate the behavioral perspective and the data perspective (i.e., events and objects), each event refers to at least one object indicated by the ''Reference'' column, e.g., ci1 refers to i1 and er1. Moreover, each event has a corresponding object model consisting of objects and object relations (in the ''Object Model'' column), which represents the state of the process just after the execution of the event. For instance, the object model in the first row indicates the state of the database (i.e., there are two records in the ''customer'' table, one record in the ''order'' table and two records in the ''order line'' table) after the first create order event (co1) happens.
Note that, each object has a corresponding class and additional attributes. In the table, the class is indicated by its ID while the additional attributes are omitted. For instance, ol1 indicates its corresponding class is ol. Table 1 also illustrates the evolution of object models. After the occurrence of some event, objects may have been added, and relations may have been added or removed.

III. DISCOVERY OF OBJECT-CENTRIC BEHAVIORAL MODELS
In this section, we propose a novel algorithm different from the one proposed in [17] to discover OCBC models from XOC logs with noise.

A. DISCOVERY OF CLASS MODELS
As OCBC models are object-centric, i.e., events are correlated by the data perspective, we discover the data perspective (i.e., class models) first. A class model defines a ''space'' of possible object models, i.e., concrete collections of objects and relations instantiating the class model. Therefore, the class model is discovered based on object models.
Each object has a corresponding class, e.g., object o1 is of class o, and these classes form the framework of a class model. Besides, objects may have a number of related objects and the number has to follow some restrictions required by the business process. For instance in the OTC scenario, each order object should have precisely one related customer object. Accordingly, the discovered class model should have relationships between classes to indicate these constraints.
Considering the above idea, we need to choose or propose a modeling language to formalize class models. After reviewing all existing class (data and object) modeling languages (such as ER, ORM, UML, etc.), we extend a subset of UML class model notations as our class modeling language. More precisely, our relationships extend cardinalities with temporal annotations to represent the constraints. Cardinalities employ non-empty sets of integers (e.g., {0, 1} is a cardinality containing two integers) to specify the constraints on the number perspective. Temporal annotations categorize cardinalities into ''eventually'' cardinalities (indicated by ♦) and ''always'' cardinalities (indicated by ). ''always'' indicates that the constraint should hold at any point in time while ''eventually'' indicates that the constraint should hold from some point onwards.
Definition 4 (Cardinalities): U Card = {X ⊆ IN | X = ∅} defines the universe of all possible cardinalities. Elements of U Card specify non-empty sets of integers.
Based on the discussion above, the following definition describes the discovery algorithm for class models, which serves as the backbone for OCBC models.
Definition 5 (Discovery of Class Model): Let L = (E, act, attrE, EO, om, ) be an XOC log. The discovered class model from L is a tuple ClaM = (OC, RT , π 1 , π 2 , src , ♦ src , tar , ♦ tar ), where: • π 1 ∈ RT → OC gives the source of a relationship such that ∀r ∈ RT : (∃e ∈ E, (r, o 1 , o 2 ) ∈ Rel e : π 1 (r) = class e (o 1 )), 5 • π 2 ∈ RT → OC gives the target of a relationship such that ∀r ∈ RT : (∃e ∈ E, (r, o 1 , o 2 ) ∈ Rel e : π 2 (r) = class e (o 2 )), • src ∈ RT → U Card gives the source ''always'' cardinality of a relationship such that ∀r ∈ RT : ∈ Rel e }|}, and • ♦ tar ∈ RT → U Card gives the target ''eventually'' cardinality of a relationship such that ∀r ∈ RT : ♦ tar (r) = {n | ∃o 1 ∈ ∂ π 1 (r) (Obj e l ) : n = |{o 2 | (r, o 1 , o 2 ) ∈ Rel e }|}, where e l is the last event in L. U ClaM is the universe of class models. Figure 3 presents a class model example which contains 9 object classes and 10 relationship types, i.e., OC = {customer, order, order line, shipment, shipment line, invoice, element relation, payment, payment line} and RT = {r1, r2, . . . , r10}. Each node represents one object class and each edge represents a relationship type. π 1 and π 2 are used to refer to the source class and target class of a relationship, respectively. For instance, π 1 (r1) = customer and π 2 (r1) = invoice. In order to distinguish source and target sides in the graph, we add an arrow on each edge from the target side to the source side (similar to the reference relations between tables). The function src is indicated by all the cardinalities on the source side of each relationship type, e.g., ♦ src , tar and ♦ tar are shown in the graph in the same way. 5 We assume that all the first (second) objects of object relations corresponding to one class relationship are of the same class in an XOC log, i.e., ∀e, e ∈ E : ∀(r, o 1 , o 2 ) ∈ Rel e , (r , o 1 , o 2 ) ∈ Rel e : r = r ⇒ class e (o 1 ) = class e (o 1 ) ∧ class e (o 2 ) = class e (o 2 ). 6 ∂ oc (Obj) = {o ∈ Obj | class(o) = oc} denotes the whole set of objects in class oc. To illustrate the discovery technique presented in Definition 5, the following steps show how the model in Figure 3 is discovered from the example log in Table 1: • OC can be learned by incorporating all classes of all objects in the object models of all events. For instance, customer is a discovered class since object models contain objects of class customer, e.g., c1 in the object model of the event co1 (c1 is an object of class c, denoting customer for short).
• For each relationship r, src (r) ( tar (r)) can be derived by integrating the number of object relations corresponding to r of each target (source) object in the object model of each event. 7 For instance, src (r5) is {1} since in the object model of each event, each order (i.e., the target class of r5) object (i.e., o1 and o2) has precisely one related customer object c1.

B. DISCOVERY OF AOC RELATIONSHIPS
The main advantage of an OCBC model is that it can present data and behavioral perspectives in a single diagram. 7 An object o is a target (source) object of a relationship r in an event e if class e (o) = π 2 (r) (class e (o) = π 1 (r)).
In Section III-A, we defined and discovered the data perspective, i.e., a class model. The subsequent tasks are defining and discovering the behavioral perspective, i.e., behavioral constraints between activities. Different from existing discovery approaches based on XES logs, and due to the deliberate lack of case notion in XOC logs, events should be somehow related before we can discover behavioral constraints between activities.
In our discovery approach, the basic idea for relating events comes from the data perspective. As mentioned, each event normally refers to some objects in its corresponding object model, and these reference relations build a bridge between the data perspective and the behavioral perspective. In this section, we refer to the reference relations as AOC relationships which relate (A)ctivities to (O)bject (C)lasses. Based on the class model and the AOC relationships, we can relate events to discover the constraints between activities (cf. Section III-C).
Definition 6 (Discovery of AOC Relationships): Let L = (E, act, attrE, EO, om, ) be an XOC log. AOC ⊆ U A ×U OC is a set of relationships relating activities to object classes discovered from L such that AOC relationships connect the data perspective with the behavioral perspective. However, they are not powerful enough to describe the cardinality restrictions between events and corresponding objects. For instance, they cannot describe the constraint that one ''create order'' event should correspond to precisely one ''order'' object. Therefore, we use cardinality constraints (similar to those on the class relationships) to specify these restrictions to complement AOC relationships.
Definition 7 (Discovery of AOC Cardinalities): Let L = (E, act, attrE, EO, om, ) be an XOC log. AOC ⊆ U A ×U OC be a set of AOC relationships discovered from L. A , ♦ A and OC are discovered cardinality functions from L where: • A ∈ AOC → U Card gives the ''always'' cardinality of the source (activity side) of an AOC relationship such that ∀(a, oc) ∈ AOC : A ((a, oc)) = {n | ∃e ∈ E : • OC ∈ AOC → U Card gives the cardinality of the target (object-class side) of an AOC relationship such that ∀(a, oc) ∈ AOC : OC (a, oc) = {n | ∃e ∈ ∂ a (E) : Based on the motivating log, Figure 4 illustrates the discovered AOC relationships and corresponding cardinalities. If an event refers to an object, the activity of the event refers to the class of the object. For instance, since event co1 refers to three objects o1, ol1, and ol2, activity create order (i.e., co) refers to class order (i.e., o) and order line (i.e., ol), which means two AOC relationships (create order, order) (see |) and (create order, order line) (see }) can be discovered as shown in Figure 4.
For each AOC relationship (a, oc), its target cardinalities can be determined by incorporating numbers of referred oc objects by each a event. Consider OC (~) for example. As cs1 has one referred order line object (ol1) and cs2 has two referred order line objects (ol1 and ol2), the directly discovered cardinality is {1, 2} and it is extended to {1, 2, . . .} (denoted as 1.. * ).
In addition, the ''always'' cardinality on the activity side can be determined by incorporating numbers of happened a events referring to each oc object at every moment after the object is created. 8 Consider A (~) for example. In the motivating log, there are four order line objects, i.e., ol1, ol2, ol3 and ol4. ol1 is created at the moment co1 happens and it is not referred to by any create shipment event. Then ol1 is referred to by cs1 at the moment cs1 happens. Afterwards, ol1 is referred to by both cs1 and cs2 at the moment cs2 happens. In summary, ol1 is referred to by (i) zero create shipment event before cs1, (ii) one create shipment event between cs1 and cs2 and (iii) two create shipment events after cs2. Based on the above discussion, the possible cardinality values for ol1 are {0, 1, 2}. Using the same method, we incorporate all values for ol1, ol2, ol3 and ol4, and the resulting cardinality is {0, 1, 2} which is generalized to '' * ''.
The ''eventually'' cardinality on the activity side can be derived by incorporating numbers of all a events (in the whole log) referring to each oc object. Take the AOC relationship as an example. ol1 is referred to by two create shipment events cs1 and cs2 while each of ol2, ol3 and ol4 is referred to by one create shipment event. Hence, the discovered cardinality is {1, 2} which is generalized to ''1.. * ''.

C. DISCOVERY OF BEHAVIORAL MODELS
In this section, we discover the behavioral model based on the class model and AOC relationships to integrate the OCBC model. More precisely, we discover a set of constraints between activities. 8 A moment indicates the time when one event just happens. The moments after one object is created include the moment when it first appears and all later moments. VOLUME 10, 2022 FIGURE 5. Given a reference event, we navigate to the target events through a triangle pattern.

1) RELATING EVENTS
As we mentioned, unlike traditional event logs [30], we do not assume an explicit case notion to relate events. Instead, we use the objects in object models and AOC relationships as the intermediary to relate events. Our goal is to discover a constraint between each two activities, and for that we appoint a reference activity and a target activity to draw a boundary for relating events. Based on this idea, we propose two ways to identify the target events for a reference event.
The first way to relate events is by a triangle pattern. If two events refer to one common object, they are related. For instance, assuming ''create order'' is the reference activity and ''create shipment'' is the target activity, we extract all involved events from the example log in Table 1. Based on the ''References'' column, we draw a graph presenting the reference relations shown in Figure 5(a). As we can see in the graph, the event co1 refers to two order line objects ol1 and ol2 while ol1 is also referred to by cs1 and both ol1 and ol2 are referred to by cs2. Therefore, cs1 and cs2 are the target events of the reference event co1, and the relating path is highlighted in red. Representing this method on the model level, the graph in Figure 5(b) indicates it is possible to extract a constraint between a reference activity and a target activity which refer to the same class (cf. Section III-C2).
Moreover, the second way to relate events is by a square pattern. If two events refer to two related objects (which are connected by an object relation in some object model), they are related. For instance, assuming ''create order'' is the reference activity and ''create invoice'' is the target activity, Figure 6(a) shows the events (of activities ''create order'' and ''create invoice'') and their related objects extracted from the motivating example. The reference event co1 refers to one order object o1 related to er1 and er2 which are referred to by ci1 and ci2, respectively. Therefore, ci1 and ci2 are the target events of co1. If we abstract this relating method on the model level, the graph in Figure 6(b) indicates it is possible to extract a constraint between a reference activity and a target activity which refer to two related classes (cf. Section III-C2).
Definition 8 (Extracting Candidate Patterns): Let ClaM = (OC, RT , π 1 , π 2 , src , ♦ src , tar , ♦ tar ) be a class model and AOC ⊆ U A × U OC be a set of AOC relationships. Function extP ∈ U ClaM × P(U AOC ) → P(U P ) maps a class model and a set of AOC relationships onto a set of candidate patterns such that extP(ClaM , AOC) = triangleP ∪ squareP where ({(a ref , π 1 (rt)), (a tar , π 2 (rt))} ⊆ AOC ∨ {(a ref , π 2 (rt)), (a tar , π 1 (rt))} ⊆ AOC)} Function extP extracts a set of candidate patterns, in which one candidate pattern is a tuple (a ref , a tar , ocrt) where a ref is called the reference activity, a tar is called the target activity, and ocrt is a class (corresponding to a triangle pattern) or a class relationship (corresponding to a square pattern) which serves as the intermediary to relate events. Taking the class model and AOC relationships in Figure 4 as input, the function extP returns 6 candidate patterns, i.e., (create order, create shipment, order line), (create order, create invoice, r8), (create invoice, create payment, r2), (create shipment, create order, order line), (create invoice, create order, r8), (create payment, create invoice, r2), in which the first two patterns correspond to the patterns in Figure 5 , a tar , cr) be a correlation pattern. Function extI ∈ U L × U P → P(E * ) correlates events in L based on P and returns a set of instances (i.e., event sequences), such that extI (L, P) = Each candidate pattern specifies a way to correlate events. Based on the extracted patterns, we correlate events in the motivating log, resulting in a graph in Figure 7. More precisely, the 10 events are sorted by time increasingly from left to right (e.g., co1 is before co2). The object(s) in the parentheses attached on edges between events indicate how the events are correlated. For instance, the four highlighted edges correspond to the examples in Figure 5(a) and Figure 6(a). 9 For a sequence σ , e.g., ins, σ i refers to the i-th element of the sequence, |σ | denotes the length of the sequence and ∂ set (σ ) converts the sequence into a set.  Target events are on the other side that has no dot. The notation is inspired by Declare, but formalized in terms of cardinality constraints rather than LTL.

2) BEHAVIORAL CONSTRAINTS
As mentioned, the result using the relating method is a set of target events for each reference event, rather than a process instance (e.g., a case) which includes events over the whole process. Therefore, we cannot use strict procedural constraints such as those in Petri nets to form the behavioral model. Instead, we employ a declarative way to indicate the temporal restriction between a reference event and its target events.
A temporal restriction may cover two perspectives, i.e., it may restrict the number of target events both before and after the reference event. In this paper, we will employ a graphical notation, i.e., a set of constraint types inspired by Declare (a declarative workflow language [33]), to represent and visualize the restrictions. Any element of U CT is a constraint type which specifies a non-empty set of pairs of integers: the first integer defines the number of target events before the reference event and the second integer defines the number of target events after the reference event. Figure 8 shows the graphical notations of 8 example constraint types.
A constraint type assigns a restriction on the target events. In our approach, if the observed target events follow the restriction corresponding to some constraint type, this type is considered as a discovered constraint type. , a tar , cr) be a correlation pattern and ct be a constraint type. A behavioral constraint is specified as a tuple (P, ct) ∈ U P × U CT . Function idCon ∈ U P × U CT → U Con gives a constraint id referring to a constraint specification. Furthermore, function speCon ∈ U Con → U P × U CT gives the constraint specification corresponding to a constraint id.

Definition 11 (Specification of Behavioral Constraints):
By relating events based on a correlation pattern, we get a set of instances, which can be used to discover constraints corresponding to the pattern. The semantics of a constraint is defined as a restriction on the relations between events in a scope. On the other hand, from the discovery angle, a constraint can be identified as a tuple of a pattern and a constraint type, i.e., (P, ct). The pattern P specifies the scope, i.e., the reference events and target events, and the constraint type ct indicates the restriction on the relations between reference events and target events.

3) BEHAVIORAL CONSTRAINT MODELS
After illustrating the method to relate events in Section III-C1 and the notation to describe behavioral constraints in Section III-C2, we can now define and discover behavioral models based on them.
Definition 13 (Discovery of Behavioral Constraint Model): Let L be an XOC log, CT be a set of constraint types, ClaM be a discovered class model and AOC be a discovered set of AOC relationships. A behavioral constraint model discovered from L is a tuple BCM = (A, C, π ref , π tar , type), where • C ⊆ U Con is a set of constraints such that C = {c | c ∈ disCon(L, P, CT ) ∧ P ∈ extP(ClaM , AOC)}, A behavioral constraint model is a collection of activities and constraints. A constraint corresponds to a constraint type, a reference activity (on the side with a dot) and a target activity (on the other side), which indicates a restriction on both the numbers of target events before and after each reference event.
Note that it is possible to merge two constraints with the same shape (e.g., two arrows) and opposite reference activity and target activity into one constraint for simplicity. For  instance, c12 in Figure 9(b) is a combined constraint, which is equal to the two constraints c1 and c2 in Figure 9(a).

D. OBJECT-CENTRIC BEHAVIORAL CONSTRAINT MODEL
Up to now, we have explained all the elements of an OCBC model and corresponding discovery approach. More precisely, Section III-A focused on structuring objects and discovering cardinality constraints on class models while Section III-B built a bridge for discovering behavioral constraints based on class models. Section III-C related events by elements in object models and discovered behavioral constraints between each pair of activities identified by the candidate patterns. Here, we give the formal definition of an OCBC model and discuss it from different perspectives based on a real scenario.
Definition 14 (Discovery of OCBC Model): Let L = (E, act, attrE, EO, om, ) be an XOC log. An object-centric behavioral constraint model discovered from L is a tuple OCBCM = (ClaM , AOC, A , ♦ A , OC , BCM , crel), where • ClaM = (OC, RT , π 1 , π 2 , src , ♦ src , tar , ♦ tar ) is the class model discovered from L (Definition 5), • AOC ⊆ A × OC is the set of AOC relationships discovered from L (Definition 6), • A , ♦ A , OC ∈ AOC → U Card are three cardinality functions discovered from L (Definition 7), • BCM = (A, C, π ref , π tar , type) is a behavioral constraint model discovered from L (Definition 13), and • crel ∈ C → OC ∪RT is the constraint relation such that ∀c = (a 1 , a 2 , ocrt, ct) ∈ C : crel(c) = ocrt. U OCBCM is the universe of OCBC models. Figure 10 shows a discovered OCBC model from the motivating log, which describes the data perspective, behavioral perspective and the interplay between them in a single diagram. In general, it clearly reveals the involved classes, activities and constraints in the OTC scenario from which the motivating log is generated.
The 9 classes and 10 class relationships make up the backbone of the OCBC model. They indicate the involved objects and restrictions which are followed in the real transactions. For instance, r9 indicates that each ''order line'' is eventually packed in some shipment lines (indicated by ♦1.. * ), which is consistent with the rule in the OTC scenario.
The 8 AOC relationships (i.e., x,..,) indicate how the behavioral perspective (i.e., activities) refers to the data perspective (i.e., classes). Besides, they describe the restriction between events and objects in the OTC scenario through cardinality constraints. For instance, | indicates a one-to-one correspondence between order objects and create order events through the constraint 1, ♦1 and 1, which means that if an object is added to the class order, the corresponding activity is also executed and vice versa. * of~shows that, at any moment, each order line may have an arbitrary number of ''create shipment'' events while ♦1.. * of~means that each order line should have at least one ''create shipment'' event eventually. In addition, the constraint on the class side of( 1.. * ) denotes that one ''create shipment'' event may send products from one or more order lines.
The discovered model also contains 8 behavioral constraints (i.e., c1..c8) which specify the temporal restrictions on behavioral perspective. For example, c7 indicates that each ''create order'' event should have at least one related ''create shipment'' event after it and c8 indicates that each ''create shipment'' event should have precisely one related ''create order'' event before it. Each constraint is identified by a candidate pattern and we use constraint relations (denoted by a dotted line between a behavioral constraint and a class or a class relationship) to indicate the class or relationship (i.e., the intermediary to relate events) of the pattern. For instance, there is a constraint relation between c1 and r2, which means c1 is identified by the pattern (create payment, create invoice, r2).

IV. ADVANCED DISCOVERY TECHNIQUES TO DEAL WITH NOISE
Model discovery techniques take an event log as input and return a model to reveal the real business process where the log is generated. The quality of the input log decides if the discovered model can really represent the real business process. In other words, to discover a representative model, the event log should contain a representative sample of data. A sample may be not suitable due to problems from two perspectives: (i) it has ''too little data'' to cover various events and objects in the process; (ii) it has ''too much data'' with events or objects unrelated to the target process. In process mining, the first perspective refers to the ''incompleteness'' problem while the second perspective refers to the ''noise'' problem.
In this section, we focus on dealing with the ''noise'' problem, which is significant in process mining and should be faced by any applicable discovery algorithm. Note that the noise here refers to the infrequent events or objects rather than the incorrect logging, since we assume all the data in logs are correctly recorded.
The approach in Section III cannot deal with noise and it mainly suffers two problems when facing noise. First, the noise may contain a variety of infrequent events or objects unrelated to the target business process, which makes the discovered models too complex (i.e., too many entities and constraints). Besides, the discovered constraints are not precise, since the approach forces the constraints to allow the occurrence of noise. In order to solve these problems, we next introduce some more advanced approaches to deal with noise in logs by (i) simplifying complex discovered models, (ii) discovering precise behavioral constraints and (ii) discovering precise cardinality constraints.

A. SIMPLIFICATION OF OCBC MODELS
The approach in Section III is sensitive to noise, e.g., one object of an infrequent class leads to the discovery of the ''noise'' class. As a result, the noise makes the useful insights hiding in the discovered complex model. In this section, we give solutions to simplify OCBC models, i.e., filtering the unnecessary entities and constraints corresponding to the infrequent events or objects, to make the insights outstanding.
Definition 15 (Support): Let L = (E, act, attrE, relate, om, ) be an XOC log, M = (ClaM , ActM , AOC, A , ♦ A , OC , crel) be the discovered OCBC model where ClaM = (C, R, π 1 , π 2 , src , ♦ src , tar , ♦ tar ) is the class model and ActM = (A, Con, π ref , π tar , type) is the activity model. The support of a class relationship is defined as the fraction of object relations corresponding to the relationship in all object relations, i.e., for each r ∈ R, The support of an activity is defined as the fraction of events of the activity in all events, i.e., for each a ∈ A, The support of an AOC relationship is defined as the fraction of references corresponding to the relationship in all references, i.e., for each aoc = (a, c) ∈ AOC, as shown at the bottom of the page.
Consider the log in Table 1 and the model in Figure 10 to understand how to compute the support of different elements. For instance, supportC(order) = 2/28 since there are two order objects and twenty-eight objects in total. supportR (r10) = 4/36 since there are four object relations of r10 and thirty-six object relations in total. supportA(create order) = 2/10 since there are two create order events and ten events in total. supportAOC(aoc5) = 2/26 since there are two references of aoc5 (between create order events and order objects) and twenty-six references in total. supportCon(con2) = 3/15 since there are three consistent instances of P2 (corresponding pattern of con2) and fifteen instances in total.
Note that the support in Definition 15 is a relative support, i.e., a fraction between 0 and 1. It is possible to compute an absolute support by only considering the numerator of each formula, e.g., supportC(c) = |{o | ∃e ∈ E : (o ∈ Obj e ∧ class e (o) = c)}| for a class c. Table 2 presents the support of all classes in Figure 10, i.e., the ''Instance number'' column shows the absolute support and the ''Support'' column shows the relative support. One can specify a threshold for support to filter out any element whose support is below the threshold. In this way, we can simplify the complex OCBC model discovered in the environment with noise.
Note that in the process of discovering an OCBC model, the elements are discovered in the order: C, R, A, AOC and Con. As a result, the latter discovery may depend on the former discovery result. For instance, the discovery of class relationships depends on the discovered classes. Therefore, when an element ele is filtered out, all its related elements (i.e., elements discovered based on ele) should be removed too. Based on the type of ele, Definition 16 gives five rules to specify the related elements.
If ele is a class c del , the related elements consist of (i) all class relationships connected to the class (R del ), (ii) all AOC relationships connected to the class (AOC del ), and (iii) all behavioral constraints which correlate events by c del or a class relationship which will be removed (Con del ). If ele is a class relationship r del , the related elements only consist of behavioral constraints which correlate events by r del . If ele is an activity a del , the related elements contain AOC relationships which are related to the activity (AOC del = {(a del , c) ∈ AOC}) and behavioral constraints which have the activity as reference or target activity (π ref (con) = a del ∨ π tar (con) = a del ). If ele is an AOC relationship (a, c) del , any behavioral constraint con which is connected to a (i.e., π ref (con) = a ∨ π tar (con) = a), and correlate events by c (i.e., crel(con) = c) or a class relationship connected to c (i.e., π 1 (crel(con)) = c ∨ π 2 (crel(con)) = c) will be removed too. Since the constraints are the last elements to be discovered, removing a constraint will not influence anything else.
For instance, for the class payment in Figure 10, its related elements are r7 and aoc2 since they are connected to payment. It has no related behavioral constraints since it is not involved in any correlation pattern. Table 2 presents the related elements of all classes in Figure 10. Based on the support and related elements in Table 2, one can set a threshold to filter classes. For instance, given a threshold 3.5/28, the classes payment, payment line, invoice, order, customer and shipment, and their related elements are removed, resulting in the model in Figure 11. Note that it is also possible to set a different threshold for each type of elements. The rules in Definition 16 still hold in these situations.

B. DISCOVERY OF PRECISE BEHAVIORAL CONSTRAINTS
In Section IV-A, we proposed an approach to simplify the complex models, which deals with the noise problem on the simplicity perspective. Besides the simplicity perspective, the noise may force the constraints to be general enough to allow its occurrence, resulting in imprecise constraints. In this section, we illustrate how to filter noise and discover precise behavioral constraints, i.e., dealing with the noise problem on the imprecision perspective.
A constraint is defined in the context of a correlation pattern. It indicates the restriction on the reference events and target events in instances correlated by the pattern. In the discovery process, the observed instances decide the discovered constraints, i.e., if all instances (corresponding to the correlation pattern) satisfy the semantics of some constraint type, a constraint of this type is discovered. The requirement that all instances satisfy the semantics is too strict in an environment with noise, since an infrequent behavior, which is only a violation of a constraint type, may ruin the discovery of the corresponding constraint. Therefore, it is necessary to propose a robust discovery approach of behavioral constraints, which can still discover a constraint event if some noise violates the constraint.
The basic idea is to filter the noise according to a threshold, and discover constraints based on the behavior without noise as shown in Figure 13. In this way, the discovered constraints are precise, i.e., they do not allow the occurrence of noise. In this paper, the noise in terms of behavioral constraints refers to infrequent instances. We define the types (i.e., variants) of instances and count the frequency of each type to identify the noise. If the frequency of a type is below the configured threshold, the instances of the type are considered as noise and filtered out. The remaining frequent types (highlighted in black) are taken as input to discover behavioral constraints. Next, we define a variant matrix to represent all possible types of instances and count the frequency of each type. The variant matrix contains nine different variants (cells), in the form of three rows and three columns as shown in Figure 12. Each variant in the matrix corresponds to a constraint type. For instance, (1; 0) is a constraint type which requires that the number of target events before (after) each reference event is one (zero). (1; 0) is the same as the unaryprecedence constraint type. (2+; 0) is a constraint type which requires that the number of target events before each reference event is larger or equal to two, and there are no target events after each reference event. The formalization of these nine constraint types is shown in Table 3.
Note that the nine variants are disjoint, i.e., ∀v, v ∈ V CT : v = v ⇒ v ∩ v = ∅. Besides, the nine variants cover all the possible relations between a reference event and its target events, i.e., each instance corresponds to precisely one variant, indicated by the relations between target events and the reference event in the instance. For example, if an instance has five target events before the reference event and precisely one target event after the reference event, i.e., (5,1), it corresponds to variant (2+; 1) (since (5, 1) ∈ (2+; 1)). Note that (5, 1) is a pair of numbers (in which the former (latter) one represents the number of target events before (after) the reference event) while (2+; 1) is a set of pairs. Definition 18 (Computing Variant Frequency): Let L be an XOC log and P be a correlation pattern. Function freV ∈ U L × U P × V CT → IN returns the frequency that a variant is observed in a log corresponding to a pattern such that freV (L, P, v) = |{ins | ins ∈ extI (L, P) ∧ ins P ∈ v}|.
For convenience, we define the following shorthand. If an instance correlated by a correlation pattern P from a log L corresponds to a variant v, i.e., ins P ∈ v, we say that v is observed once in L. Based on this idea, Definition 18 defines a function to count how many times a variant is observed in a log corresponding to a pattern. The frequency of a variant is equal to the number of instances which satisfy the requirements of the variant.
For instance, let L be the log example in Table 1 and P = (create order, create shipment, order line). After event correlation, we get two instances { co1, cs1, cs2 , co2, cs3 } where co1 and co2 are reference events as shown in Figure 14. In the first instance, there are zero and two target events before and after the reference events, respectively, i.e., co1, cs1, cs2 P = (0, 2) ∈ (0; 2+). Therefore freV (L, P, (0; 2+)) = 1 and similarly freV (L, P, (0; 1)) = 1 according to the second instance. Figure 14 shows the frequency matrix, i.e., frequencies of all variants observed in L corresponding to the pattern P. Based on the frequency matrix and a configured threshold, one can identify the frequent variants for discovery.
Definition 19 (Frequent Variants): Let L be an XOC log and P be a correlation pattern. V fre ⊆ V CT is the frequent variants observed in L corresponding to P, such that for any v ∈ V fre , , p = freV % (L, P, v) and p = freV % (L, P, v ). Definition 19 provides three solutions to identify frequent variants by filtering infrequent variants (i.e., noise) according to a threshold. In the first solution, a variant is frequent if its frequency is equal to or larger than the given threshold (i.e., an integer). Figure 15(a) shows an example using the first solution to identify frequent variants. With a threshold 25, the variants (1; 1) and (2+; 1) are frequent (highlighted in black) since their frequencies (40 and 35) are above the threshold. The second solution employs a similar idea, only changing the absolute frequency to a relative ratio.
The first solution is based on the frequency of a individual variant and the second solution also considers the total frequency (i.e., a fraction of individual frequency in total frequency). They have limitations in some cases, where the frequencies of other variants should be considered when identifying if a variant is frequent. For instance, the variant (0; 1) has the same frequency (i.e., 8) in the frequency matrices in Figure 15(a) and Figure 15(b), and the frequency sums of all variants in these two matrices are also the same (i.e., 100). To identify if variant (0; 1) is frequent in these two different matrices, the first and second solutions return the same result given the same threshold. Apparently, (0; 1) is relatively more frequent in the second matrix, since the frequency differences between variants in the second matrix are smaller. In this situation, the entropy can be used to identify the frequent variants, as shown in the third solution.
Definition 20 (Discovery of Behavioral Constraints by Frequent Variants): Let L be an XOC log, P be a correlation pattern and CT be a set of constraint types. Function disCN ∈ U L ×U P ×P(CT ) → P(U Con ) discovers behavioral constraints from a log with noise, such that disCN (L, P, CT ) = {con | con = idCon(P, ct) ∧ ct ∈ CT fre \ CT red } where V CT is the frequent variants observed in the log, and • CT red = {ct ∈ CT fre | ∃ct ∈ CT fre : ct ct}, which is the set of redundant constraint types. If all frequent variants consist with a constraint type ct (i.e., ∀v ∈ V fre : v ⊆ ct), a constraint of this type is discovered, i.e., con = idCon(P, ct). Note that it is possible that there exist overlapping constraint types. For instance, the response type contains the unary-response type, i.e., the response type is looser. If two types are contained by CT fre and one contains the other, the looser type is called a redundant constraint type. When discovering constraints, redundant types are discarded.

V. IMPLEMENTATION
Our discovery algorithm has been implemented in the ''OCBC Model Discovery'' Plugin in ProM. 10 Figure 16 shows the interface of the plugin and a discovered model 10 Access http://www.promtools.org/prom6/nightly/ to download ProM 6 Nightly builds and update the OCBC package.
(in panel {) from the example XOC log. 11 Note that the discovered model is same as the model shown in Figure 10 except the combined constraints are displayed separately.
Panel x presents the distribution of cardinalities and the instances related to one selected constraint (highlighted in red in panel {). Panel y shows the attributes of a selected edge or node. It is possible to zoom in/out models through operating panel z. Using the filter panel |, it is possible to filter behavioral constraints based on constraint types (the plugin discovers constraints of 8 types shown in Figure 8) and activity names, to get a simplified model. Panel } enables users to guide the model discovery approach, i.e., update (or rediscover) behavioral constrains by selecting approaches of computing metrics and configuring weights (between 0 and 1) for three metrics.
The time of model discovery increases along with the number of events in logs. Figure 17 shows the time for different numbers of events. For example, our plugin takes around 35 seconds to discover a model from a log with 240,000 events. According to the line indicated by the dots, we claim that the time is linear with the numbers of events.

VI. COMPARISON OF OCBC MODELS AND OTHER MODELS
Based on the motivating example, we compare OCBC models with other models such as Declare models, directly follow graphs and Petri nets, and show advantages of our modeling language. The basic idea is to transform the motivating data into XOC logs and XES logs. Then we discover OCBC models from XOC logs and discover other models from XES  logs. The models are compared by checking which models can best describe the business process from where the data comes.

A. BUSINESS PROCESS
First, we introduce the business process of the motivating example. Figure 18 employs an informal notation to describe the business process. In general, the arrows of edges indicate the temporal order between activities while the cardinalities on edges indicate the constraints on the instances of activities. For instance, the edge between the ''create invoice'' activity and the ''create payment'' activity means that each ''create payment'' event must occur after its corresponding ''create invoice'' event, while the cardinalities on the edge means that each ''create invoice'' event corresponds to precisely one ''create payment'' event and each ''create payment'' event corresponds to one or more ''create invoice'' events.
According to this business process, we operate Dolibarr to generate transactions in the database. By incorporating different numbers of orders, we can extract logs of different sizes as shown in Figure 17. To facilitate understanding, we only show the comparison result based on the data (e.g., two orders) as shown in Figure 1. Based on the data, it is possible to create the XOC log shown in Table 1. Additionally, we can also transform the data into an XES log as shown in Figure 19. We consider the order id as the case id and use the method in [30] to relate events, resulting in two cases o1 and o2. Note that this generated log has convergence problems, e.g., ci2 is contained by two cases as if it is executed twice though it is performed only once in Dolibarr, and divergence problems, e.g., in case o2 ''create payment'' has two instances cp1 and cp2 which cannot be distinguished in the case though they are performed on different documents in the Dolibarr (i.e., cp1 is on ci2 and cp2 is on ci3) [26], [29].

B. COMPARISON OF DISCOVERED MODELS
Based on the generated XOC and XES logs, we compare the data Petri net, BPMN model, Declare model, and directly follow graph (DFG) discovered from the XES log and the OCBC model discovered from the XOC log. Figure 20 displays a discovered data Petri net based on the approach in [7], [15], and [16]. On the behavioral perspective, the discovered Petri net has a lot of implicit transitions and loops. It is not precise and allows behavior which is not possible in the real business process. For instance, the ''create shipment'' activity can be skipped in the discovered Petri net, which violates the real process in which each ''create order'' should be followed by at least one ''create shipment'' event. Besides, the many-to-many relationship between ''create order'' and ''create invoice'' in the process is transformed into a one-to-many relationship. On the data perspective, some variables and guards are discovered, which present the conditions to trigger transitions. The dotted line shows the interactions between activities and variables. Figure 21 presents a discovered BPMN model using the approach in [6]. The model is not precise, e.g., the sub-process is a ''flower'' model involving activities ''create payment'' and ''create invoice'', which can happen at any order for any number of times. The approach in [6] only discovers the behavioral perspective with hierarchies, and we do not find any existing work which can automatically discover the data perspective. However, it is possible to manually add (or discover them automatically in the future) variables, data objects and data stores to describe the data perspective.
A discovered data-aware Declare model [20], [22] is shown in Figure 22(a). For simplicity, we limit the discovered constraint types to the set of ''response'', ''non-response'', ''precedence'' and ''non-precedence''. Declare models use a declarative way to express the constraints between pairs of activities [21]. For instance, the constraint c1 between   ''create invoice'' and ''create payment'' indicates that each ''create invoice'' event should be followed by a ''create payment'' event and each ''create payment'' event should be preceded by a ''create invoice'' event. The real business process   indicates ''create payment'' events cannot occur before corresponding ''create invoice'' events. However, a negative constraint (c2) needed to describe this rule is not discovered, since in case o2, the ''create payment'' event cp1 is wrongly related to its subsequent ''create invoice'' event ci3 (the divergence problem, cf. Section VI-A). Besides, existing constraint types in Declare models do not have a constraint type like c3 to restrict the cardinality between ''create invoice'' and ''create payment'' events, which is required in the business process. The discovered data perspective consists of guards on constraints, which specify the situations when constraints are enabled. For instance, if we add an instance o3 = co3, ci4, cp3 , where the attribute customer of co3 is c2, into the log shown in Figure 19, a guard customer = c1 is discovered on the ''response'' constraint between ''create order'' and ''create shipment'', since the ''create order'' events are followed by ''create shipment'' events only in o1 and o2 where customer = c1.
A DFG shows the directly-follow relations between activities and possibly also shows the corresponding frequencies. Indicated by Figure 22(b), ''create invoice'' activity happens 4 times which is not consistent with reality (the convergence problem, cf. Section VI-A). Moreover, the DFG shows that a ''create payment'' event is directly followed by a ''create invoice'' event once, which also violates the rule in the real business process.
In comparison, our OCBC model shown in Figure 10 is better able to describe the real business process. On the behavioral perspective, since the XOC log has no convergence and divergence problems (multiple instances can be distinguished when we relate events, cf. Section III-C1), all constraints are correctly discovered. Besides, with the support of a class model and AOC relations, the cardinality constraints (e.g., one-to-many and many-to-many relationships) between activities can also be clearly described. The data perspective is powerfully described with a class model.

VII. CONCLUSION
In this paper, we proposed a discovery approach for a novel model, named Object-Centric Behavioral Constraint (OCBC). OCBC models graphically describe control-flow and data/objects with cardinality constraints in a truly integrated manner. In this way, we overcome the problems of existing approaches that separate behavioral (e.g., BPMN, EPCs, or Petri nets) and the data (e.g., a class model) perspectives. Moreover, OCBC models allow different types of instances to interact in a fine-grained manner and the constraints in the class model guide behavior.
Besides, some metrics such as fitness, precision and generalization are proposed to evaluate discovered models, and users can guide the model discovery algorithm by configuring metric weights. At last, we compare OCBC models with other models such as BPMN, Declare models, DFG and Petri nets, and show that our model can better represent object-centric business processes with one-to-many and many-to-many relations.
Moreover, this paper serves as a starting point for a new line of research. Next to model discovery and its support tools (OCBC Model Editor and OCBC Model Discovery Plugin) in ProM, we also support conformance checking. Based on OCBC models, many deviations which cannot be detected by existing approaches can be revealed. Her research interests include medical information systems, medical process management, dental data analysis.