Process activity ontology learning from event logs through gamification

The quality of event log data is a constraining factor in achieving reliable insights in process mining. Particular quality problems are posed by activity labels which are meant to be representative of organisational activities, but may take different manifestations (e.g. as a result of manual entry synonyms may be introduced). Ideally, such problems are remedied by domain experts, but they are time-poor and data cleaning is a time-consuming and tedious task. Ontologies provide a means to formalise domain knowledge and their use can provide a scalable solution to fixing activity label similarity problems, as they can be extended and reused over time. Existing approaches to activity label quality improvement use manually-generated ontologies or ontologies that are too general (e.g. WordNet). Limited attention has been paid to facilitating the development of purposeful ontologies in the field of process mining. This paper is concerned with the creation of activity ontologies by domain experts. For the first time in the field of process mining, their participation is facilitated and motivated through the application of techniques from crowdsourcing and gamification. Evaluation of our approach to the construction of activity ontologies by 35 participants shows that they found the method engaging and that its application results in high-quality ontologies.


I. INTRODUCTION
P ROCESS mining is the analysis of event logs, historical process data, which generates actionable insights to improve processes by business owners [1]. The use of poor-quality event logs leads to unreliable analysis results and insights (i.e., garbage in -garbage out) [2]. Of critical importance are the activities that are performed in a process and their correct identification requires a deep insight into the domain involved. However, data quality can be compromised. For example, the same activity in a process might be referred to by different labels, a case of so-called synonymous labels [3]. Also, different abstraction levels of activity labels (i.e., too detailed vs too general) are quite common [4].
To use domain knowledge in improving the quality of activity labels, the preliminary step is to represent knowledge in a format that a computer can process. An ontology is one of the prominent methods to formally specify concepts and relationships in a particular domain. Ontology learning is the task of extracting ontological elements (e.g. concepts, and relations) from data [5]. The main goal of ontology learning is to share and reuse the knowledge of the domain effectively [6]. While automatic approaches to ontology learning from general data sources have been proposed (e.g. [7]- [11]), there has been limited attention paid to ontology learning from process data. As argued by Mendling et al. [12], learning ontologies from process-related data sources is one of the challenges that semantic process mining is facing.
The quality of the ontology learned from data is shown to be proportional to the human intervention level [13]. Domain experts are well-positioned to create activity ontologies from process data; however, it is a challenge to engage them in the tedious task of ontology learning as they are usually in-demand, expensive, and time-poor. Thus, the research question of this paper is how can we create activity ontologies from process data in an engaging manner?
In this paper, the Process Activity Ontology (PAO) is created through a novel method, which uses crowdsourcing and gamification techniques. Crowdsourcing [14] provides a diversity of knowledge of a large group of people without any limitation to their time and location. In this approach, some strategies are used to further estimate the level of domain expertise of participants. Gamification is the use of game elements in non-gaming contexts to enhance user experience [15]. To the best of our knowledge, ontology learning has not been conducted through the use of a gamified system. The concepts in the PAO are activity labels and the relations VOLUME 4, 2016 captured are synonymy, hypernymy (i.e., the super-class subclass relation), holonymy (i.e., the whole-part relation), and antonymy based on the relations maintained in WordNet [16]. The terms activity and label are used interchangeably to refer to activity labels in the rest of this paper.
The remainder of the paper is organised as follows. Section II presents related work. Section III describes the research methods. The structure of the PAO is discussed in Section IV. Section V describes the gamified crowdsourced approach to ontology learning. The evaluation of the gamified system is presented in Section VI and Section VII summarises the approach and suggests future work.

II. RELATED WORK
This section reviews the existing literature on the topics of activity label quality improvement in process mining, ontology learning in general, ontology learning in process mining, and gamification.
The improvement of the quality of activity labels has been studied in the process mining literature [17]- [20]. Some approaches are at the process model level [17], [18], [21], and some others are at the event log level [19], [20], which mainly focus on detecting and repairing activity labels with the same syntax but different semantics (i.e., homonymous labels [3]). Activity quality improvement has been conducted through the use of domain knowledge [22]- [25], which can be sourced from an existing domain ontology [18], or it can be provided by humans (either a few individuals [22], [23], or a crowd [24], [25]). Koschmider et al. [18] propose a semi-automatic approach to revise activity labels in process models using part-of-speech tagging algorithms and existing or manually created ontologies. The final revisions of the labels are selected by human users of the designed tool. Other methods [24], [25] identify synonymous labels through crowdsourcing and argue that such a task is not as straightforward as most automatic approaches suppose. In our earlier work [26], a gamified crowdsourced method is proposed which detects and repairs synonymous activity labels in an event log. While existing approaches focus on synonymous labels, the approach proposed in this paper identifies not only synonymy but also other types of semantic relations that are hypernymy (i.e., the super-class sub-class relation), holonymy (i.e., the whole-part relation), and antonymy between activity labels through the use of a crowdsourced gamified system. Ontology learning is the task of extracting ontological elements from input data [5] that has been well-studied in the literature (refer to [5], [13], [27] for some example review papers). Based on a review by Hazman et al. [27], approaches to ontology learning can be classified based on the input data and the method. The input data to an ontology learning system might be unstructured (e.g. textbooks) [8], semi-structured (e.g. HTML web pages) [28], or structured (e.g. relational databases) [29]. The method used for ontology learning can be automatic [7], [8] or it can involve human intervention. Asim et al. [13] review the automatic ontology learning approaches and classify the techniques they use into three main categories: (1) linguistic methods [9], based on language characteristics such as part-of-speech-tagging; (2) statistical methods [10], based on probability and frequency measures; and (3) logical methods [11], based on inductive logic programming techniques to derive hypotheses.
Ontology learning has been conducted manually by a limited number of domain experts with the aid of tools like Protégé [30] or OntoEdit [31] in some works [32], [33]. Another set of approaches [34]- [36] use human input to either prepare the input for automatic algorithms or to verify and amend their outputs. The idea of using the knowledge of the crowd (i.e., crowdsourcing) for ontology learning has also been studied in the literature [37]- [39]. Ontologies have been used to improve gamification design [40], [41]. However, to the best of our knowledge, no gamified system has hitherto been designed to crowdsource ontology learning.
Within the field of process mining, there has been limited attention to the use and creation of ontologies to improve analysis outcomes. There are some works that manually or semi-automatically (with the aid of tools) integrate existing domain ontologies into their methods to, for example, extract event logs from relational databases [42], transform activities of an event log to a higher level of abstraction [43], or revise activity labels in process models [18]. Some approaches [44], [45] generate a glossary of activity labels found in a collection of process models in a domain. The glossary automatically created by Peters et al. [44] consists of activity labels as complete phrases, their element type (event or function in the Event-driven Process Chain (EPC) process modelling language), and their control flow information (e.g. co-occurrence, exclusiveness, and strict order). The glossary manually generated by Becker et al. [45] includes grammar checked activity labels as well as their breakdown terms. The semantic relations between the terms that are synonymy, antonymy, homonymy (i.e., the relation between activity labels with the same syntax but different semantics [3]), and word formation are also manually extracted from a thesaurus. The former approach [44] is automated but lacks information about the semantic relations between labels, while the latter [45] is manual and may thus require serious efforts, which threatens its applicability.
In another automatic method, Coskuncay et al. [46] create a process ontology from process models. Their ontology includes different elements of a business process, such as activities, resources, and tools. However, the created ontology leaves the question of semantic relations between activities unanswered because such information cannot be acquired from a process model, only from domain experts. There are also a few automatic methods that use natural language processing techniques to build a process ontology from textual descriptions available in organisational archives [47], [48]. The structure of their ontologies is similar to the one created by Coskuncay et al. [46] but adds further information, for example, the whole-part relation between organisational roles. These approaches are only applicable when the rele-vant textual descriptions are accessible. Rebmann et al. [49] semantically augment an event log through natural language processing of its textual attributes [49]. They extract objects, actions, actors, and passive resources involved in a process from text and add them to the event log; however, they do not model them as a separate artifact, e.g. an ontology. Our approach consists of a gamified crowdsourcing system for engaging domain experts in the construction of an ontology of activity labels of an event log (structured data), which includes semantic relations-synonymy, hypernymy, holonymy, and antonymy-between these labels. The resulting ontology can be used to improve activity quality.
Gamification has been growing in popularity and been successfully applied in various domains, e.g., healthcare [50]- [54], education [55]- [58], and the environment [59]- [61]. The Cure [51] is an example of the application of gamification in the healthcare domain designed to predict the clinical outcome and drug response to breast cancer in women based on their genomic specifications. Within the education domain, gamification has led to the enhancement of student engagement [55]- [57] and improved learning [58]. The Khan Academy [55] is an example of a gamified online education environment that uses game elements, such as levels and badges, to engage students in learning complicated subjects, which otherwise they might not be interested in studying. In the environment domain, human knowledge has been deployed to explore nature. An example is Fraxinus [59], which is a game designed to cure a genomic disease in ash trees. Gamification is used in our approach to improving user engagement in the task of ontology learning.

III. METHOD
In order to answer the research question, a number of steps were taken: (1) The most common semantic relations between activity labels in event logs were identified based on our observations in real-life event logs. These relations were further framed as the relationships maintained in Word-Net [16]. (2) In the next step, the structure of the PAO was specified in the Web Ontology Language (OWL) [62] based on the available information in event logs and semantic relations between activity labels. (3) A number of motivational drives and game elements were selected from well-known and validated gamification frameworks, such as Octalysis [63] and Self-Determination Theory (SDT) [64], [65], to be incorporated in the gamified system design. (4) The gamified approach was designed and implemented to identify the semantic relations between activity label pairs in input event log(s). (5) The approach was validated using a public real-life event log and 35 participants. The quality of the created PAO was assessed by its comparison to a ground truth PAO and the user engagement was assessed through surveys.

IV. THE PROCESS ACTIVITY ONTOLOGY
The activities performed in a process and their semantic relations are modelled in the PAO, which employs the semantic web technologies. This section provides a background to the semantic web concepts (Section IV-A), and describes the structure (Section IV-B) and the semantic relations (Section IV-C) of the PAO.

A. SEMANTIC WEB BACKGROUND
The Semantic web aims to improve the World Wild Web by representing web data in a machine-readable format [66]. The knowledge representation in the semantic web is performed in a number of steps (a.k.a. the "layer cake" of the semantic web [67]), each constructing a layer based on another [68]. The layers sorted upwards in the layered cake are the Uniform Resource Identifiers (URIs; which name things uniformly across the web), the Extensible Markup Language (XML; which structures web documents in a user-defined format), the Resource Description Framework (RDF; which models objects and their semantic relations as triples -subject-predicate-object), the Resource Description Framework Schema (RDFS; which provides primitives, e.g. class, property, sub-class, domain and range, to structure web objects into hierarchies), and the Web Ontology Language (OWL; which is based on the Description Logic (DL) language and has richer semantic constructs than RDFS, e.g. disjointness and cardinality of classes).
The PAO will be represented in OWL [62], which is developed by the Web Ontology Working Group of the World Wide Web Consortium (W3C). It is a standard ontology language that is supported by well-established software packages such as Protégé [30]. Inference (a.k.a. reasoning) is the task of deriving new data from existing data and is performed by an inference engine (a.k.a. reasoner or classifier). Reasoning over an ontology in a DL language, e.g. OWL-DL (an OWL sub-language [69]), can be carried out by reasoners such as HermiT [70], FACT++ [71], and RacerPro [72].
Reasoning in DL-based languages is based on the Open World Assumption (OWA). In OWA, the reasoner does not classify the statements that are not proven to be true as false statements; they are rather assumed to be missing from the ontology [73]. For example, if the statement 'activity a and activity b are synonymous' is added to the PAO, the query 'is activity b synonymous to activity c?' will return 'unknown'. In the opposite Closed World Assumption (CWA; used in relational databases), where the available information is the whole truth [73], the answer would be 'false'. The OWA is an appropriate assumption here because the activities and relations that are missing from an incomplete PAO do exist in the real world. For example, activity b and activity c may be synonyms in reality but their relation might be simply missing from the PAO at a certain point of time.
An OWL ontology is a collection of basic pieces of knowledge called triples. A triple is an expression consisting of a subject, a predicate, and an object, e.g. 'Human eats vegetable', or 'garden is green'. OWL triples are built on VOLUME 4, 2016 top of RDF triples. For example, the triple 'Human rdf:type owl:class' declares an OWL class representing the concept 'Human'.
A set of triples construct an axiom in OWL. Axioms denote concepts as classes, relationships as object properties, attributes as data properties, and objects as individuals of classes. OWL allows to define a hierarchy of classes (with the root class "Thing"), where the sub-classes represent more specific concepts than the super-class(es). For example, the OWL class axiom 'Class (Human)' consists of one triple 'Human rdf:type owl:class'. Other types of axioms in OWL include disjoint, domain and range, and closure axioms. Figure 1(a) depicts a snippet of an event log related to a loan application process. Each row describes an event (identified by an eventId), which is the record of the execution of a welldefined step in a process. The events are categorised into a number of cases (identified by caseId), each denoting one execution of the process (e.g. a specific loan application). An event is described by an activity (e.g. 'Submit documents') performed at a particular point of time (represented by a timestamp) as part of a case (e.g. c1). There may be some additional attributes for an event, such as the resource (e.g. 'User1') who is originating the activity, and some data attributes (e.g. 'Amt' which is the loan amount requested in an application ).

B. THE PROCESS ACTIVITY ONTOLOGY STRUCTURE
The PAO represents activities performed in a particular process (e.g. the loan application process) and their semantic relations. Figure 1(b) shows the general structure of the PAO consisting of two high-level classes: Activity (representing all activities that can be performed in a process) and DataAttribute. The Activity class has four data properties, which are timestamp, caseId, resource, and suitability (that is 0 by default and is used to identify the dominant activity label amongst a number of synonyms), and the DataAttribute class has two data properties, which are name (denoting the name of a data attribute, e.g.Àmt'), and value (denoting the value of a data attribute, e.g. '10000'). The object property hasData is defined from the class Activity to the class DataAttribute. The lower level classes of the PAO depend on the process it represents. The sub-classes of the Activity class are the distinct activity labels recorded in one or more event log(s) related to a process. For example, Figure 1(c) depicts the lower level classes of the PAO related to the loan application process recorded in the event log illustrated in Figure 1(a). The OWL statement: defines the class Check_loan_amount representing the activity 'Check loan amount' 1 in Figure 1(a). Check_loan_amount is a sub-class of Activity because it is a type of activity that is performed in the loan application process. The individuals 1 Please note that spaces in OWL are replaced by underscores. of a certain activity class (e.g. 'Check_loan_amount') in the PAO are the events (e.g. e4 and e13) where the activity is performed in the event log(s). In this formulation, an event is equated with an instance of the execution of an activity. The relations between activity classes in Figure 1(c) are described in Section IV-C.

C. SEMANTIC RELATIONS OF ACTIVITIES
The PAO models four possible relationships between activities of a process, although it can be extended in the future to include more relations. The relationships are synonymy, hypernymy, holonymy, and antonymy and are discussed in the following sections.

1) Synonymy relation
Two activity labels are synonyms if they are referring to the same task in a process, e.g. 'Begin check' and 'Start check' in Figure 1(a). Synonymous labels are most common when an event log stems from multiple systems that name the same task differently [3].
A synonymy relationship between two activity labels is represented by the equivalence axiom between their corresponding classes in the PAO. For instance, the OWL statement: declares that the two classes Begin_check and Start_check are referring to the same concept (i.e., activity) in the loan application process. As a result, the DL reasoner will automatically infer that any individual (i.e., event) that is a member of the class Begin_check also belongs to the class Start_check and vice versa. The synonymy relation is depicted by the green edges in Figure 1(c).

2) Hypernymy relation
Hypernymy is a semantic relation between a general activity called hypernym and a more specific activity called hyponym. For example, 'Pay' is a hypernym and 'Pay online' is a hyponym activity. Also, two activities that share the same hypernym activity are called cohyponyms, e.g. 'Pay online' and 'Pay by cash' that share the same hypernym 'Pay'. Such relationships are more common between labels of event logs that are derived from systems that allow free-text data entry. As a result, the performers of the activities record them with different levels of details in data [74].
The hypernymy relationship between activities is modelled as the sub-class axiom in the PAO. For example, the OWL statements: SubClassOf (Pay Activity) SubClassOf (Pay_online Pay) SubClassOf (Pay_by_cash Pay) mean that the class Pay is the super-class (i.e., hypernym) and the classes Pay_online and Pay_by_cash its sub-classes (hyponyms). These statements also imply that the two classes Pay_online and Pay_by_cash that share the same super-class are cohyponyms. The DL reasoner will classify all members of a sub-class (e.g. Pay_online and Pay_by_cash) as members of its super-class (e.g. Pay). The hypernymy relation is depicted by the blue arrows (with the arrow towards the hypernym) in Figure 1(c).

3) Holonymy relation
Holonymy denotes the relation between a whole activity called holonym and a partial activity called meronym. For example, 'Submit documents' is a holonym and 'Submit form A' is a meronym activity, where form A is one of the documents that is submitted as part of a loan application. Two activities that are parts of the same whole activity (i.e., holonym) are called comeronyms, e.g. 'Submit form A' and 'Submit form B'. Part-of relations, similar to kindof relations, are more typical between activity labels of event logs recorded through free-text inputs.
The holonymy relationship between activities is defined by an object property called hasPart and a number of class restrictions that use existential quantifier someValuesFrom and universal quantifier allValuesFrom. For instance, assuming that forms A and B are the only documents that need to be submitted as part of an application, the OWL statements: Class (Submit_documents Activity restriction (hasPart someValuesFrom Submit_form_A) restriction (hasPart someValuesFrom Submit_form_B) restriction (hasPart allValuesFrom (Submit_form_A or Submit_form_B))) mean that Submit_documents is the class of any activity that, among other things, has some Submit_form_A parts and also some Submit_form_B parts and has only Submit_form_A VOLUME 4, 2016 or Submit_form_B parts ( Figure 1). The third restriction is a closure axiom without which, due to OWA, it could be possible that the Submit_documents activity would have some parts other than Submit_form_A and Submit_form_B. The holonymy relation is depicted by the orange arrows (with the arrow towards the meronym) labelled with 'hasPart' in Figure 1(c).

4) Antonymy relation
Two activity labels are antonyms if they represent opposite tasks, e.g. 'Begin check' and 'End check'. Antonymous labels can be performed at any stage of a process; for example, they are common after decision points where an application, for instance, is either accepted (e.g. 'Notify acceptance') or rejected (e.g. 'Notify rejection').
The antonymy relationship is represented by disjoint axioms in the PAO. For example, the OWL statement: means that the classes Begin_check and End_check refer to completely different concepts ( Figure 1). The DL reasoner will conclude that the members of the class Begin_check can never belong to the class End_check and vice versa. OWL classes can generally overlap (i.e., an individual of a class can also belong to another class) unless they are explicitly specified to be disjoint from one another. Furthermore, because the classes Begin_check and Start_check are already declared as equivalent, the reasoner concludes that Start_check and End_check classes are also disjoint. The antonymy relation is depicted by the red edges in Figure 1(c).

V. ONTOLOGY LEARNING APPROACH
Incorporating domain knowledge in process mining analysis leads to more reliable outcomes. Figure 2 depicts a novel gamified approach to ontology learning. The approach takes one or more event log(s) related to a process as the input and generates a PAO (or adds to an existing one) as the output. If more event logs related to the process become available, they can be used as the input for another round of ontology learning through the gamified system to enhance the existing PAO. The approach collects knowledge from multiple domain experts (i.e., through crowdsourcing), and uses a number of motivational drives [63] to engage them in the task of ontology learning. To further assess the level of domain expertise in participants, two strategies are used: the participants are asked to declare how familiar they are with a particular process and a number of control questions are embedded in the game.
As illustrated in Figure 2, the method consists of three phases: (1) the game setup phase, where some candidate label pairs that might be semantically related are identified from the input event log(s), (2) the game play phase, where the participants, interact with the gamified system to determine the semantic relations between candidate label pairs, and (3) the post-game phase, where the game responses are prioritised (based on the level of domain expertise of the participants) and used to build or enhance the PAO. The input event logs will be used to add individuals (events) to the PAO. The three phases are described in the following sections, where the set of all activities in the input event log(s) is referred to as the activity universe and is denoted by A.

A. GAME SETUP
The input event log(s) may contain a large number of distinct activity labels. In order to make the approach feasible, candidate label pairs that may have some sort of semantic relation are identified from the event log(s). For this purpose, the notion of context similarity as defined in previous work [75] is used. This notion is a weighted average of a number of similarities between activities: string, control flow, resource, data, and time. The idea is that activities that are semantically related (i.e., are synonyms, antonyms, or at different levels of abstraction) are performed at the same stage of the process (control flow similarity), by the same people or roles (resource similarity), at similar times, e.g. day of the week or month (time similarity), or they have similar data attributes (data similarity). Even in the case of antonyms, and this may be less intuitive, some sort of context similarity may exist. For example, 'accept' and 'reject' activities related to a particular decision are both performed at the same stage of a process.
Formally, given A, the activity universe, let S avg (a, b) be the context similarity between activities a and b (note that S avg (a, b) = S avg (b, a)). Assume a total order ≺⊆ A × A. The particular choice of order, e.g. lexicographical, does not matter, but its existence allows us to avoid having to examine symmetrical label pairs. If we know the relation between a and b, we know the relation between b and a as well. As candidate questions, we consider activity pairs with a high enough context similarity, i.e. greater or equal to a given threshold θ s . This set C is defined as: The actual questions, recorded in Q, are selected from C. For practical reasons, their number is restricted and limited to those with the highest context similarity. The set should not contain redundant questions, i.e. questions to which the answer can be derived from other questions. Therefore, we exclude questions concerning a and b if we also have questions relating a and b resp. to c. Formally, let n be the desired number of questions, then Q ⊆ C such that all of the following conditions hold: A control question bank is also prepared in the game setup phase. Each control question asks about the semantic relation

B. GAME PLAY
A gamified system called The Quality Guardian Rosebud 3 is designed to motivate domain experts to participate in the ontology learning task. The game supports a number of motivation drives defined in the Octalysis gamification design framework [63]. Some drives are primarily reinforced in the design (i.e., primary drives), and some are maintained as side motivations (i.e., secondary drives). The three primary drives are as follows: 1) Epic meaning: People with this drive enjoy doing tasks for reasons beyond themselves, e.g. helping other less fortunate people. The game supports the epic meaning desire by describing the reasons for building the ontology, for example, helping to advance knowledge or to improve data quality. 2) Development and accomplishment: This is the internal desire of people to progress, achieve goals, overcome challenges, and acquire new knowledge. Collectable points and badges are designed to encourage participants to set in-game goals. Progress bars show the progress towards the goals (Figures 3 and 5). Different types of semantic relations between activities with some examples are presented in the game to support gaining knowledge. 3) Social influence and relatedness: This is the desire to obtain social acceptance, collaborate, and compete with others. The game supports this desire by ranking participants on a leaderboard (Figure 3) and showing a summary of the responses of other participants. These aim to give participants the feeling that they are involved in a social task. In addition to the primary drives, two secondary drives are incorporated into the game design; they are as follows: 1) Unpredictability and curiosity: This drive motivates people to experience tasks that involve uncertainty or chance. In the game, participants are asked to spin two fortune wheels to find out what their next question is ( Figure 4). This feeling of curiosity aims to encourage them to proceed to the next question. 2) Creativity and feedback: This is the desire for creativity to bring imagination to life. As participants determine the semantic relations between label pairs, a graph, where nodes are labels and edges are relations, is built for them in parallel (Figures 4 and 5). To support the creativity of participants, the game provides them with means to reshape, filter, and edit the graph. Participants initially view the main screen of the game, which conveys the message of "Jointly building a knowledge base" to encourage them to contribute their knowledge. Next, participants see a demo video of the game (skippable), which describes activity label quality issues in event logs and that they can be resolved through the use of an activity ontology. Also, the demo teaches participants how to play the game and introduces different types of semantic relations between activity labels (i.e., synonymy, hypernymy, holonymy, and antonymy). The sign-up form is presented next to participants requesting them to create a username and password and declare their expertise level with regards to the domain of the event log used in the game. After signing up, participants are directed to their profile page (Figure 3), where they can view the game status, e.g., the levels to complete, the badges to achieve, their current score. They can return to the profile page any time during the game.
When people are given a choice, they tend to enjoy the task more than when they are given a single option [77]. To provide participants with a sense of choice, the input label pairs are categorised into multiple topics in the game. The categorisation can be based on different context perspectives, e.g. control flow (i.e., the label pairs that are performed in the VOLUME 4, 2016 same stage of a process are in the same topic), or resource (i.e., the label pairs that are performed by similar roles are in the same topic). Topics can be viewed and selected on the user profile page (Figure 3). Each topic consists of a number of levels. Participants start with the first level, which has the lowest number of questions (the easiest level), and as they proceed to the next levels, they answer more questions. The increasing level of difficulty aims to support participants' desire for competence and accomplishment [78].
A game level consists of a number of questions, each asking about the semantic relation between two specific activity labels. On the main board of the game (Figure 4), there are two fortune wheels, which participants need to spin to reveal the next question. The participants can select one of the options synonym, antonym, a hypernymy relation (i.e., hypernym, hyponym, or cohyponym), a holonymy relation (i.e., holonym, meronym, or comeronym), or no relation. They can view the description of these relations along with some examples. In case of selecting any of the three relations: synonym, cohyponym, and comeronym, participants are asked to provide an extra label. If they choose the synonym relation, they are asked to suggest a repair for the identified synonyms. They can either choose one of the labels as the repair, or suggest a new label through an input field. In case of the cohyponym and comeronym relations, the common hypernym or holonym is asked, respectively.
Three hints are offered for each question: (1) the absolute and relative frequency of the two labels in the input event log(s), (2) their context similarity based on their control flow, resource, time, and data information in the event log(s) using the similarity measures defined in previous work [75], and (3) the summary of the responses of other participants, that is, the semantic relations offered by other participants. The third hint aims to give participants the feeling that there are other participants, thus inspiring a sense of social presence.
During the game, the recorded responses of participants are demonstrated to them as a graph (Figure 4). The nodes denote labels' ID and the edges represent the semantic relations between labels. As illustrated in Figure 4, different types of semantic relations are shown by different colours and shapes of edges. The synonymy and antonymy relations are depicted by green and red lines, respectively. A yellow arrow shows the hypernymy relation (with the arrow towards the hypernym), and a blue arrow with a diamond head shows the holonymy relation (with the diamond towards the holonym). The width of an edge between two nodes is larger if more participants have identified the corresponding relation between the two labels. For example, in Figure 4, the edge between nodes 53 and 54 is thicker than the edge between 54 and 55 because more participants think that the label with ID 53 is the meronym of the label with ID 54. Filtering options are also provided to allow focusing on relations of interest. The graph provides participants with an overall view of their responses, which can be revisited and changed if required.
Upon completing a level, participants earn a number of stars and diamonds. The stars measure participants' game experience and are granted based on the number of questions answered in the level. The diamonds reflect the participants' domain knowledge and are granted based on the correctness of the answers to the control questions embedded in each level, and the string score (defined in our previous work [26] as the repair score). The string score is computed for the new labels suggested by participants as the repair of synonym labels or as the shared hypernym of cohyponym labels or holonym of comeronym labels. The score considers the length of a string and whether it is free from special characters and numbers. The diamonds are used to prioritise answers accordingly in the post-game phase. To further inspire participants' feelings of competence, they are awarded different badges, e.g. Phenom Contributor (Figure 3), once they gather a certain number of stars and diamonds.
Providing ways for people to compare their scores to others may make them feel more competitive [79]. A leaderboard is embedded on participants' profile page (Figure 3), which enables them to compare the number of their stars and diamonds to their peers' and aims to fulfil their need for social relatedness. As another feedback mechanism, the participants receive summative feedback when they complete all levels in a topic, as depicted in Figure 5. The feedback includes the total number of stars and diamonds earned in the topic and a graph that represents the semantic relations identified between label pairs in the topic. This feedback reillustrates what participants have accomplished on a topic of the game to satisfy their needs for competence.

C. POST-GAME
In the post-game phase, the PAO is built using collected answers from participants. The method is to approve responses of higher weight, where the weight of an answer is denoted by the participant's level of domain knowledge, represented by the number of diamonds gained in the game, and the participant's domain expertise level, which they declared at the sign-up. In crowdsourcing studies, this idea is referred to as vote weighting [80].
Let Λ ⊆ Q × Φ × R × P be the set of all game responses, where Q ⊆ A × A is the label question bank generated in the game setup phase (Section V-A), each question consisting of a pair of labels, Φ = {synonym, antonym, hypernym, hyponym, cohyponym, holonym, meronym, comeronym, no relation} is the set of all semantic relations, R ⊆ Σ * is the set of all extra labels suggested by participants (in case of synonym, cohyponym, and comeronym relations) where Σ is the set of all characters, and P is the set of participants. An answer λ = (λ q , λ ϕ , λ r , λ p ) where the constituent parts refer to the question, the semantic relation, the extra label, and the participant. The extra label for relations other than synonym, cohyponym, and comeronym is an empty string ε. The function W : Λ → [0, +∞), assigns a weight to an answer λ ∈ Λ and is defined as W (λ) = τ knowl * k(λ p ) + τ expert * e(λ p ) where k : P → N and e : P → N are the functions that assign values to the demonstrated knowledge and the declared expertise of the participant λ p and τ knowl , τ expert ≥ 0 are the relevant coefficients.
For any question q ∈ Q, any semantic relation ϕ ∈ Φ, and any extra label r ∈ R the total weight function Total : Q × φ × R → [0, +∞) is defined as Total (q, ϕ, r) = Σ p∈P,(q,ϕ,r,p)∈Λ W (q, ϕ, r, p) which is the sum of weights of the answers that have provided extra label r for the identified relation ϕ between the labels in question q. To compute this summation for all extra labels for the relation ϕ in question q, the function Total (q, ϕ) = Σ r∈R Total (q, ϕ, r) will be used.
Π is the set of final semantic relations to be used for building the PAO (Equation 6). For every question q, the relation ϕ max with the maximum total weight will be selected 4 if its total weight is more than a weight threshold θ w . Furthermore, the extra label r max with the maximum weight for the relation ϕ max between labels of the question q will be approved 5 . In the case of synonymous activities, the value 4 arg ϕ max Total(q, ϕ) returns the set of ϕ arguments that maximize the function Total(q, ϕ) for a fixed q. 5 arg r max Total(q, ϕm, r) returns the set of r arguments that maximize the function Total(q, ϕm, r) for a fixed q and ϕm.
of the suitability data attribute in the PAO will be set as the weight of r max .

VI. EVALUATION
The Quality Guardian Rosebud game has been implemented as a Java web application deployed on a Tomcat server. This gamified system is domain-independent; that is, no domain-specific considerations are made in the design of this approach. The semantic relations distinguished in this approach, e.g., synonymy, antonymy, hypernymy, and holonymy, are general and can exist between activity labels from different domains, as they are adapted from the semantic relations defined in WordNet [16]. Also, the motivational drives and game elements supported in the gamified system design apply to participants from different domains, as they are based on generic gamification frameworks, e.g., Octalysis [63] and Self-Determination Theory [64]. However, for the evaluation, a public real-life event log [81] that is the record of the building permit application processes executed in a Dutch municipality is used as a case study. Considering   the constraints of the evaluation, e.g., public vs. private event logs and recruiting sufficient participants, evaluating this gamified system with event logs from multiple domains is out of the scope of this study. Details of the evaluation are described in the following sections.

A. DATA
The public BPIC15_2 event log [81] was used for the evaluation. This event log contains data related to a building permit application process in a Dutch municipality covering applications received over a period of approximately four years. The original event log contains 832 cases, 44354 events, and 410 distinct activity labels. However, incomplete cases (identified by the data attribute 'caseStatus') and infrequent events (identified by the ProM plugin 'Filter Log using Simple Heuristics' with threshold 85%) were filtered out from the log to focus on the main process behaviour. The filtered log 6 contains 616 cases, 30813 events, and 82 distinct activity labels.
In the game setup phase, Equations 1-4 were used to generate n = 36 game questions from the filtered log. The weighted average similarity S avg (a, b) was computed based on the control flow, resource, data, time, and string similarities [75] with the weights 2, 2, 1, 1, and 1, respectively. Higher weights are assigned to the control flow and resource dimensions because they are more informative for assessing the similarity of activities [75]. A similarity threshold (θ) of 0.4 was used so that more activities could be investigated by participants.
A total of 36 label pairs (i.e., questions), made up of 47 distinct activity labels, and 9 control questions were included in the game. The control question bank was generated using the available resources in the municipality domain 7 . The label questions were divided into three topics representing three stages of the building permit application process: (1) application lodgement and initial checks, (2) decision making, and (3) notification and application closure activities. The 'action_code' data attribute in the log, which indicates the stage of an activity, is used to categorise questions into the three topics. Each topic consists of three levels; level 1, which is made up of three label questions and one control question, level 2, which consists of four label questions and one control question, and level 3, which consists of five label questions and one control question.

B. PROCEDURE
The game was made publicly available online for 45 days. It was advertised to QUT staff and students, BPM and process mining researchers and practitioners on LinkedIn and through the international academic networks. The participants played the game and were scored using stars and diamonds as described in Section V-B. The maximum number of stars and diamonds that could be earned in each topic was 900 and 300, respectively. The four XP badges Good Starter, Newbie Contributor, Phenom Contributor, and Chief Contributor were granted to participants when they accumulated 100, 600, 1500, and 2700 stars. Similarly, the three knowledge badges Problem Solver, Expert, and Master were awarded once 200, 500, and 900 diamonds were collected.
In the post-game phase, the answers were weighted according to function W described in Section V-C. The function k assigns the values 1, 2, 3, or 4 to the demonstrated knowledge of a participant if they have received zero, one, two, or three knowledge badge(s), respectively. Similarly, function e assigns the values 1, 2, 3, or 4 to the declared expertise of a participant if they have declared (in the sign-up form) to know nothing, a little, a fair bit, and a lot about the building permit application process. Equation 6 is applied to choose the best semantic relations. The value of the threshold θ w is selected in such a way that the type of semantic relation between two activities is approved only if they have a weighted vote of at least half of the maximum possible vote. A comprehensive list of parameters is available online 8 . The PAO (referred to as the game PAO) is created from the selected semantic relations by a Java program incorporating the OWL API [82] and the HermiT [70] reasoner 9 .
A ground truth set of semantic relations between candidate label pairs of the BPIC15_2 event log [81] was created through an interview with a domain expert, who is very familiar with the process and the activity labels in the event log 10 . The expert was introduced to different types of semantic relations between activity labels: synonymy, hypernymy, holonymy, and antonymy. The candidate label pairs that were provided as the input to the gamified system were presented to the domain expert who then identified the semantic relation of each pair. The collected relations were used to create the ground truth PAO through the same Java program which created the game PAO.

C. ANALYSIS METHOD
The similarity between the game PAO and the ground truth PAO is computed. The Semantic Cotopy (SC) notion defined by Maedche et al. [83] is adapted for this purpose. The SC notion [83] is expanded to cover other types of semantic relations. Specifically, four cotopy notions are defined to evaluate the PAO: (1) Hypernymy Cotopy, which looks at super-class sub-class relations defined in Section IV-C2 (same as the SC [83]), (2) Holonymy Cotopy, which focuses on part-of relations defined in Section IV-C3, (3) Synonymy Cotopy, which looks at equivalence relations defined in Section IV-C1, and (4) Antonymy Cotopy, which focuses on disjoint relations defined in Section IV-C4. The four similarity measures Overall Hypernymy Overlap (HyO), Overall Holonymy Overlap (HoO), Overall Synonymy Overlap (SO), and Overall Antonymy Overlap (AO) are computed based on each cotopy notion, respectively.
While approaches to ontology learning are already proposed in the literature [5], [13], [27], they cannot be directly compared with this approach. The reason is that the type of input data they accept is not an event log. Even the few ontology learning approaches [44]- [46] in the area of process mining take inputs other than event logs, e.g. a set of process models. Furthermore, the structure of the ontology that these approaches create is different from the structure of the PAO; they do not formalise semantic relations of activity labels in their ontologies. Another possible setting for the evaluation could be to create an ontology of activity labels in an event log using a purely technical method and compare the quality of the acquired ontology with the ontology created through gamification. However, we did not opt for that setting in this paper. We only focused on how gamification can be used to improve user engagement in the tedious task of ontology creation.
To measure participants' engagement in the gamified system, they were asked to fill out a survey consisting of 15 questions with 5 point Likert scale answers (strongly agreestrongly disagree). Each question is about one desirable feature of a gamified system based on the Octalysis framework [63], the GameFlow evaluation framework [78], and Self-Determination Theory (SDT) [64], [65] or the Player Experience of Need Satisfaction (PENS) model [84]. Game-Flow aims to measure people's enjoyment with a gamified system. The items are mapped from the flow theory [85] to games. SDT describes psychological needs that motivate humans to act, and PENS adds further game-related concepts to this theory. Table 1 shows a comprehensive list of the items measured in each question and the relevant feature in each of the frameworks. The Cronbach's alpha coefficient [86] is computed to measure the reliability of the survey responses.

D. RESULTS
A total of 40 participants played the game, out of which 35 participants were considered for the evaluation because they completed at least one topic and the survey. More specifically, 9 participants finished one topic, 2 completed two topics, and 24 answered the questions of all three topics. Figure 6 illustrates the semantic relations identified by the game participants compared to the ground truth (GT) 11 . As shown in Figure 6, the semantic relation of 30 out of 36 label pairs was identified through the use of the gamified system (i.e., |Π| = 30 in Equation 6). All semantic relations were identified in the first topic, which had the highest number of participants (i.e., 35). There were 2 missing relations in the second topic (i.e., for label pairs number 17 and 24) with 26 participants, and 4 missing relations in the third topic (i.e., for label pairs number 27, 28, 32, and 34) with 24 participants. A higher number of participants results in more approved relations because it increases the weight of responses allowing them to pass the threshold θ w . No antonymy relation has been identified by participants, which conforms to the ground truth. Overall, the type of relations recognised in the game match the ground truth except for 5 questions (i.e., numbers 2, 12, 16, 22, and 33). A lower weight threshold θ w can lead to more approved answers and a higher threshold can result in more correct answers. Table 2 summarises the similarity measures HyO, HoO, SO, and AO between the game PAO and the ground truth PAO. As reported in Table 2, there is a high overall hypernymy, holonymy and synonymy overlap between the two ontologies. The overall antonymy overlap is 0 because there exists no antonym relation in any of the ontologies. The results of Figure 6 and Table 2 confirm that the gamified system has been successful in creating a high-quality process activity ontology. Figure 7 demonstrates the user engagement survey results categorised based on the frameworks (a) Octalysis [63], (b) GameFlow [78], and (c) SDT [64]. The overall Cronbach's alpha coefficient of 0.8918 in this survey shows that it is of acceptable reliability. Participants found the system engaging overall (M=3.74, SD=1.01), which implies that they had a positive user experience. Figure 7(a) shows that development and accomplishment (M=3.63, SD=1.06), epic meaning (M=3.59, SD=1.11), and creativity and feedback (M=3.53, SD=1.08) were the top motivations for participants to engage with the gamified system. However, participants were neutral about the effectiveness of the unpredictability and curiosity (M=2.86, SD=1.24) and the social influence and relatedness (M=3.1, SD=1.32) motivation drives. The former was mainly because unpredictability was not given as much prominence in the game design, it being a secondary motivation drive, and the latter can be explained by the survey being presented to the participants after the first topic with most of them not having seen the leaderboard yet. The 11 participants who completed the survey at the end of the second or the third topic were more influenced by the social motivation (M=4.05, SD=1.07); this is because participants can only choose the next topic to work on through their profile page, where the leaderboard is located ( Figure 3). As depicted in Figures 7(b) and (c), the average score for all features is between 3 and 4. Overall, participants' responses show evidence that gamification is a promising method to achieve a good user experience in the task of identifying semantic relations between activities.  [63], GameFlow [78], and SDT/PENS [64], [65], [84] # Item Framework Octalysis [63] GameFlow [78] SDT/PENS [64], [65], [ Helpfulness for the community Epic meaning and calling --14 Curiosity to reveal the fortune wheels Unpredictability and curiosity --15 Overall engagement ---

VII. CONCLUSION
A novel approach to learning a process activity ontology from event logs is proposed. The method uses crowdsourcing and gamification techniques to enhance user experience in the task of ontology learning. The semantic relations covered in the process activity ontology are synonymy, hypernymy, holonymy, and antonymy between activity labels. The results with 35 participants show that the semantic relations of 30 out of 36 label pairs are identified through the gamified approach. A higher number of participants leads to detecting more semantic relations. Out of the 30 semantic relations diagnosed, 25 match the ground truth. Also, the PAO created through the gamified system is highly similar to the ground truth PAO.
The survey results show that participants found the system engaging overall. After creation, a process activity ontology can be used for activity label quality improvement, i.e., to repair synonymous labels and activity labels at different levels of granularity in event logs. Furthermore, the ontology can be extended in the future to include other semantic relations, e.g. control flow relations between activities where it is indicated, for example, that an activity, must always be performed before another activity in a process and can VOLUME 4, 2016 never occur before. Such relations can further contribute to the repair of data quality problems in event logs.

VIII. ACKNOWLEDGMENT
We are grateful for the contributions of Dr Joos Buijs, who acted as the domain expert. This study has been approved by the QUT Human Research Ethics Committee (approval number 2000000047).
MRS SAREH SADEGHIANASL is a PhD candidate at the Business Process Management Group of the School of Information Systems, Queensland University of Technology in Australia. Her research focuses on improving the quality of input data, that is, an event log, to prepare it for process mining, which concerns analysing organisational processes. More recently, she is looking at using gamification techniques for data cleaning to make it more fun, as it is currently the most tedious and time-consuming task of every data analysis project. She has a Computer Science background and has skills in software engineering, algorithm design, and programming.
PROF ARTHUR H.M. TER HOFSTEDE received his PhD degree from the Katholieke Universiteit Nij-megen (since renamed to Radboud Universiteit), Nijmegen, The Netherlands, in 1993. He has worked at Queensland University of Technology, Brisbane, Australia, since 1997, and is currently a professor and the Head of the School of Information Systems. His research interests lie in the area of business process management, in particular business process automation and process mining. He was involved in the well-known workflow patterns initiative and at QUT he has managed the well-known YAWL initiative. He is a coauthor on over 200 publications, including over 100 journal publications.
PROF MOE THANDAR WYNN completed her PhD in the area of workflow management in 2007 from Queensland University of Technology, Brisbane, Australia. She leads the Business Process Management Research Group at QUT. She is Vice-Chair and one of the steering committee members of the IEEE Taskforce on Process Mining. She has published 80+ refereed papers, including 30+ journal articles. Her ongoing research focuses on process-oriented data mining (process mining), data quality and robotic process automation for the digital transformation of processes.

DR SELEN TURKAY is a lecturer in Human
Computer Interaction at Queensland University of Technology (QUT). Her research interests include the design of personalised, interactive, and collaborative immersive environments. Specifically, she studies the effects of design choices on user agency and wellbeing outcomes, as well as user experiences including engagement and motivation in variety of contexts including education and health. Prior to joining QUT, Dr Turkay worked as a research scientist and a post-doctoral research fellow at Harvard University. She earned her doctoral degree in Instructional Technology and Media at Columbia University Teachers College.
PROF TRINA MYERS is a Computer Scientist and the Associate Dean (Learning and Teaching), Faculty of Science at Queensland University Technology. Trina is the current President of the Australian Council of Deans of ICT (AC-DICT) that aims to promote and advance ICT education, research and scholarship on behalf of Australian universities. Trina's research interests span both Technology-based research and Learning and Teaching-based research. Trina's Technology-based research interests focus predominately on semantic technologies, data and knowledge management, collective intelligence and the Internet of Things (IoT). She works predominantly in multi-disciplinary research projects that have included other disciplines such as Health Science, Conservation Management and Business.