Context-Aware Process Performance Indicator Prediction

It is well-known that context impacts running instances of a process. Thus, defining and using contextual information may help to improve the predictive monitoring of business processes, which is one of the main challenges in process mining. However, identifying this contextual information is not an easy task because it might change depending on the target of the prediction. In this paper, we propose a novel methodology named CAP3 (Context-aware Process Performance indicator Prediction) which involves two phases. The first phase guides process analysts on identifying the context for the predictive monitoring of process performance indicators (PPIs), which are quantifiable metrics focused on measuring the progress of strategic objectives aimed to improve the process. The second phase involves a context-aware predictive monitoring technique that incorporates the relevant context information as input for the prediction. Our methodology leverages context-oriented domain knowledge and experts’ feedback to discover the contextual information useful to improve the quality of PPI prediction with a decrease of error rates in most cases, by adding this information as features to the datasets used as input of the predictive monitoring process. We experimentally evaluated our approach using two-real-life organizations. Process experts from both organizations applied CAP3 methodology and identified the contextual information to be used for prediction. The model learned using this information achieved lower error rates in most cases than the model learned without contextual information confirming the benefits of CAP3.


I. INTRODUCTION
Process mining [1] allows the extraction of useful information from event logs and historical data of business processes. This information can be used to improve the performance of these business processes. One of the applications of process mining is the predictive monitoring of business processes [2], which predicts different aspects of the execution of a business process, such as the next activity [3], [4], or the value of a process performance indicator [5]- [7]. Process performance indicators are quantifiable metrics focused on measuring the progress towards a goal or strategic objective The associate editor coordinating the review of this manuscript and approving it for publication was Alberto Cano .
aimed at controlling and improving the business process [8]. Some examples are the remaining execution time of a process instance, the likelihood of a fault in the system or the abnormal termination of a running instance. These predictions enable the application of proactive and corrective actions to improve process performance and mitigate possible risks in real time.
Recently, there have been many research efforts focused on improving the quality of these predictions. One stream of work has successfully used data from the context associated to a running process to improve the predictive performance [9]- [14]. This context associated to a process is the knowledge potentially relevant to guide its execution [15]. This knowledge can be associated with an activity or with the whole process. For instance, the location where a process occurs or the level of priority of a specific activity can be considered as knowledge associated with a process or activity, respectively. Process context information plays an important role in process mining as reported in the literature [16], [17].
The use of context in these works is limited to building a predictive model using all of the available contextual information in the event log without considering whether this data is the most relevant for process prediction. This may be a problem because including too much information might not always be beneficial for the predictive quality [18]. Moreover, predictive monitoring algorithms may consider certain features as relevant while they are not [19]. Therefore, the identification of adequate context information, which can leverage the forecasting of process indicators, becomes imperative. However, this is not an easy task because depending on the indicator, different context attributes can be relevant. For instance, if we want to predict the state of an activity, the involved human resource can be the context to be considered while for predicting the remaining time of a process execution the priority variable may be the context to consider. In this paper, we address this issue by identifying the context information of a business process related to a certain indicator so that it can be used to improve its prediction.
In [20], a methodology named ORGANON is proposed for identifying business process-relevant contextual information which could impact on the process goals. Based on a set of criteria and a matrix for analyzing ontological transactions, this methodology discovers the essential activities and then their main attributes are examined. If the variation in the value of these attributes impacts the goal of the process, they will be identified as elements of the immediate/internal context. This methodology identifies existing context elements of a process, but does not focus on the context that is relevant for process performance indicators.
In this paper, we propose a methodology named CAP3 (Context-aware Process Performance indicator Prediction) that comprises two phases: (1) an extension of the ORGANON methodology for the definition of the context necessary for predicting process indicators. Our main goal is to inject context-oriented domain knowledge and experts' feedback for improving the effectiveness of current predictive solutions. (2) A context-aware predictive monitoring technique that uses the relevant context as input. Experimental results on the application of our approach in two real-life organizations confirm the benefit of the approach.
This work has implications for the operational management of organizations, by suggesting a methodology to define the context information which provides informational support to decision makers about when, where and why business processes need to be adapted. In addition, an improvement in the prediction error of the performance indicators, in many occasions, also means savings in human and economic resources and prevention of important loss of turnover to the companies [21].
The remainder of this paper is organized as follows. Section II includes some definitions and works related to the context in BPM and introduces predictive monitoring. Section III summarizes the related works in this area. Section IV presents the contributions of our work. The experiment and the discussion of the obtained results are presented in Section V. Finally, Section VI concludes the work and presents possible future directions.

II. BACKGROUND
This introductory section provides some background on the context identification in BPM and the predictive monitoring of business processes. Specifically, Section II.A includes some definitions and works related to the concept and role of context in BPM. Then, Section II.B introduces some basic concepts of the predictive monitoring of business processes later used in this paper.

A. IDENTIFYING CONTEXT IN BPM
Generally speaking, context can be defined as the circumstances in which an event occurs. According to [22], context is an open concept, since it is not limited to the imagination of a person, while [23] explains context as "any information that can be used to characterize the situation of an entity." Yet, [24] states that context restricts one step at a troubleshooting without intervening in it explicitly. In other words, context is useful information for the performance of activities and interactions that occur in a work process [25].
Context of business processes supports the understanding of the variations in each instance, i.e., each process execution could have a distinct set of context information associated. Moreover, it helps to explain why decisions were made. In business processes, context can be defined as the minimum set of variables that contains all the important information that impacts their design, implementation and execution [26].
In [27], authors present a metamodel structured in three layers, that together, are able to support the representation of process context in a particular domain. The first layer is the Context Metamodel, where process context is formally defined. It refers to the elements related to the manipulation of context and their relationship. Among these elements are Contextual Element and Focus. A context is defined as the set of contextual elements and those contextual elements are related to a focus, for instance to a particular activity of the process. According to the authors, each instance of a business process is subject to changes in context, and as well, contextual knowledge can add relevant information to support the execution of activities.
Defining the correct context is a challenge. In [20], a methodology for identifying business process-relevant contextual information called ORGANON is described. This methodology is based on a questionnaire, a set of criteria and a matrix for analyzing ontological transactions. ORGANON is divided into two steps. First, essential activities are discovered, i.e. the ones that have a direct influence on the process goal [28]. These activities are selected according to a semi-structured guide consisting of a set of questions answered by experts of the process. Then, according to [29], an ontological transaction matrix is built, registering the relationship that keeps the essential activities together. Thus, it is necessary to consider whether these activities form complete cycles of an ontological transaction. A cycle is described by four phases (request, promise, state and accept) which comprises a set of activities performed by an initiator (client/applicant) and an executor that aims to achieve a certain goal [20]. Activities which form complete cycles of an ontological transaction are considered essential activities.
Once these essential activities have been detected, it is necessary to elicit inner attributes from them (i.e. all the inputs and outputs in the business process activities modeled, external data, artifacts or business rules as described in [27]) and analyze the impact of each attribute in the process goal. The impact analysis verifies what may occur with the goal of process (achieved/not achieved) if the value of an attribute varies in an unpredictable way. If the variation in the value of these attributes impacts the goal of the process, they will be identified as contextual elements.

B. PREDICTIVE MONITORING OF PROCESS INDICATORS
Predictive monitoring of business processes provides the forecast of process performance indicators of a running process instance with a predictive model and can be used as support for decision making in an organization [21]. Examples of cases where predictive monitoring can be used are: an insurance company wants to predict the remaining execution time of a process instance (e.g. complete time to resolve a claim), or an IT company wants to predict the number of incidents solved in one month to know if a certain service agreement will be satisfied.
Predictive monitoring relies on building a predictive model from an event log of the business process. An event log (L) is composed of a set of traces (T ). Each trace (T i ) reflects an execution of a process instance. Formally, we can express a trace as an ordered list of events T i = [E i 1 , . . . , E i m ] where E i 1 represents the first event and E i m the final event of trace T i . Similarly, a log can be expressed as the set of traces for the instances that have started and finished in an interval of time L = [T 1 , . . . , T n ] where T 1 represents the first executed trace and T n the last in the time interval. Finally, an event represents the execution of just an activity of the process. Each event contains a set of attributes (a), which represents all the information for the definition of such event, e.g. timestamp, the name of the activity, the resource that executes the activity, or the value of some data used throughout the instance, E j = [a j 1 , . . . , a j o ] where o determines the total number of attributes of the event. An example of a typical event log is depicted in Table 1. Each trace of this event log contains an event id, which is a unique identifier of each event, a timestamp, that indicates the time and date of the execution of an activity, the name of this activity, the resource or person who executes the activity, and finally the cost of the activity. A process performance indicator (I ) is a quantifiable metric focused on measuring the progress toward a goal or strategic objective. Indicators can be classified into two types: single-instance indicators or aggregated indicators. The former is computed for each trace in the log using the values of the attributes of the events that compose this trace. Therefore, it can be defined as a function of a trace T , i.e. I (T ). This function can return a binary value, e.g a determined condition fulfilled by the trace, or a real value, e.g. the duration of an activity. Instead, an aggregated indicator is computed for a set of traces by aggregating multiple values of a single-instance indicator using some aggregation function, e.g. sum or average. An example of this type of indicator could be the percentage of incidents solved in a certain period of time. In this paper, we consider both single-instance and aggregated indicators.
One of the main issues addressed in predictive monitoring of business processes is the prediction of the value of an indicator before a process instance finishes by means of a predictive model. Therefore, a predictive model for an indicator I is a function P I ([E i 1 , . . . , E i l ]), that computes a prediction for I from the trace [E i 1 , . . . , E i l ], where E i 1 is the first event and E i l is the last event that have occurred in trace T i at a given moment.

III. RELATED WORK
Some methods in the literature have employed contextual information for business process predictive monitoring. In [30], a clustering oriented method that predicts processing times and associated SLA (Service Level Agreement) violations is presented. The running instance is assigned to a reference scenario (cluster), which is used for the prediction. The predictive model is based on decision trees, called Predictive Clustering Tree (PCT). The definition of these clusters, generated by Predictive Clustering sub-module, can be represented as a set of logical decision rules and groups traces according to similar target values. The inputs of the method are a log event with data attributes and environment features, a target measure and a threshold of risk. Prediction accuracy is evaluated using Root Mean Squared Error (RMSE) and Maximum Dwell Time (MDT).
In [31], a process trace is converted into a set of context properties and attributes of process. A clustering method is used to select the most significant structural patterns to make the forecast. This clustering method considers the context data and target variables derived from performance values. Three different regression algorithms (Linear regression, RepTree and IB-k) are used for the prediction. The inputs of the algorithm are the traces of a log event, and a target performance measure (in this case, the remaining processing time). The tuples are constructed from the event data information of the traces. Some derived attributes and context information are also included in the encoding.
In [6], statistical techniques for the prediction of events and their correlation with contextual elements of transportation processes, such as weather conditions or road traffic, are applied. An integration platform named FInest, that incorporates the predictive monitoring module, was developed. The method receives 3 different data sources: system messages from the processes, aggregated data with additional information of the process, such as estimated time of arrival vs. actual arrival or the cause for delays, and quality indicators from the CARGO 2000 system. The system returns a prediction of the delay in the deliveries.
Although the work in [9] does not provide prediction performance measurements, it considers the process context for the analysis of key process performance indicators. The authors performed a statistical analysis to extract significant differences in performance measures for different analyzed contexts. These performance measures are calculated using the process entities labeled with different context attributes.
In [32], a method to categorize possible environmental conditions and case properties into context categories which are meaningful for the process execution was proposed. It is related to our proposal in the sense that it searches for knowledge which influences the execution of a business process, but it differs in the sense that the main goal is to group this knowledge.
In recent years, there has been an ever growing interest in the area of context-aware process predictive monitoring, with a number of works approaching this challenge from different angles. Yeshchenko et al. explore in [10] the idea of integrating the external unstructured context of business processes into prediction methods. In particular, they propose a technique to enrich event logs with sentiment information extracted from media content by means of sentiment analysis techniques. As evaluation, XGBoost is applied for the prediction of the remaining time of a process in four different event logs, comparing the results between the pure and the enriched event logs with positive results.
In [14], a technique for a document-aware predictive business process monitoring is presented. In this case, the event log is enriched with structured context from documents, extracted using a text-based approach for automated information extraction. The authors plan to use long-short term memory neural network (LSTM) to predict next activity, but no actual evaluation is reported in the paper.
The work in [11] examines the impact and effects of incorporating discrete and continuous context data attributes on prediction quality and accuracy. The authors evaluate the application of a LSTM network with different input configurations on a real-life event log to predict the next occurring event. They show that prediction accuracy can be significantly improved by incorporating additional event data attributes in LSTM based process prediction.
Senderovich et al. [12] argue the important role of inter-case dependencies in predictive process monitoring. They present a method for feature encoding of process cases that relies on a bi-dimensional state space representation, including intra-and inter-case dependencies. For the inter-case encoding they propose to partition the recorded (and running) cases into case types, and use a derivation function to avoid feature space explosion. They evaluate their approach in two real event logs and show the improvement of the prediction of the remaining process time in running cases, using linear regression, Lasso, random forests and gradient tree boosting. RMSE and MAE (Mean Absolute Error) are the prediction accuracy measures selected.
Finally, Hinkka et al.'s main goal in [13] is to improve the prediction accuracy of prediction models, for any case-level prediction task, by exploiting additional event attributes that are often available in the event logs while also taking into account the scalability. The authors introduce a ''method to exploit event attributes into Recurrent Neural Network (RNN) prediction models by clustering events by their event attribute values and using the cluster labels in the RNN input vectors, instead of the raw event data''. In four out of the five datasets evaluated, the proposed approach outperformed having the actual attribute values in the input vector, also reducing training and prediction times.
Although some of the aforementioned works explore the idea of exploiting the context information for the prediction, some are focused on a certain type of contextual information, e.g. sentiments, or data sources, e.g. documents, or prediction activity, e.g. remaining process time. Others seek for identifying the dependencies amongst cases or improving the performance of the prediction itself. Generally, most of these works do not inform how the contextual attributes were chosen to compose the log. We go a step further and aim at guiding the process of identifying which the appropriate contextual information to improve the predictive monitoring is, since this has proven not to be a trivial task [20]. In that sense, many of them can be used to complement our proposal here, whose main contribution is providing an extended methodology based on ORGANON [20] to extract context attributes from business processes for the predictive monitoring using domain knowledge of the process. This knowledge can be obtained from experts or managers of the process.

IV. PROPOSAL
Our proposal CAP3 (Context-aware Process Performance indicator Prediction) has two major parts: (i) a methodology to elicit the relevant contextual elements for the process monitoring presented in Section IV-A; (ii) a context-aware predictive monitoring technique that uses the relevant context as input, which is described in Section IV-B.

A. CONTEXT IDENTIFICATION METHODOLOGY
ORGANON methodology presented in [20] discovers the contextual information associated to a business process which VOLUME 8, 2020 are aligned to the objectives of the process. However, we need to extend this proposal to extract the context information necessary to improve the predictive performance of PPIs. But it is important to note that including too much information or adding the incorrect features is not beneficial for the predictive quality [18]. Therefore, we have adapted the ORGANON methodology to exclusively identify those context attributes with a direct influence on the PPIs to be predicted. Focused on this purpose, new tasks have been added to ORGANON. Figure 1 depicts the extension of ORGANON methodology that we propose in this research. All the steps of the methodology are new within the exception of Step 3.
First of all, we need to analyze the process model and the PPIs we want to predict (Step 1). As inputs, we take all identified PPIs and the process model. PPIs can be obtained from the documentation of the process or can be directly requested from the process analyst. One or more PPIs of the process can be selected for prediction. Our method works for both simple and aggregated indicators which are computed using previous measures defined over multi-process instances [33]. We can follow some specific criteria for the elicitation of PPIs such as the answer to the question: what is the PPI that is related to a higher number of activities? Secondly, an interview with the business process analyst is carried out (Step 2). The questionnaire (detailed in Table 2) used as input of this activity collects some of the questions reflected in ORGANON to determine some information about the execution of the process and some new ones related to external context attributes or the prediction of PPIs. This questionnaire is generic (i.e. independent of the process). The output of this activity is the questionnaire filled with the answers provided by the process analysts. These answers will be useful during the rest of the procedure. Following our methodology and after answering the questionnaire, we focus on questions 3 and 7 to obtain a preliminary list of Essential Business Entity (EBE) candidates. These are the essential elements of the business process, such as items or artifacts, which should be handled by the process [34]. They are represented with the data object named EBE handled by the process.
The next steps involve identifying the attributes related to the PPIs from the information provided in the questionnaire. ORGANON methodology just identifies the internal attributes, which are all the inputs and outputs in the business process activities modeled, external data, artifacts, business rules, among others classified in [27]. This is performed in Step 3, which is a subprocess that groups three activities previously defined in the ORGANON methodology. The goal of the first two activities (identify which activities consume EBEs and identify ontological blocks) is to identify the essential activities related with an EBE. The essential activities [28] are those which have a direct influence on the goal of the process. The third activity involves the elicitation of attributes from each essential activity. The details on how to perform these activities are provided at [27].
However, for the prediction of PPIs, we also need to identify external and process attributes that are not derived from the essential activities. External attributes are those that reflect events unrelated to the execution of the process but can have a direct influence on the process, e.g. the weather. Process attributes are related to inherited characteristics of the process usually reflected in the event logs of the organizations, such as the name of the activity or timestamp. The elicitation of these attributes is carried out in Steps 4 and 5, respectively. These two activities receive as inputs the questionnaire filled by process experts and the list of EBEs. Specifically answers to questions 10 to 14 provide us   information about the essential external and process attributes which could be considered as context attributes.
Once all the essential attributes (internal, process and external) are defined, the process business analyst assesses the impact of each attribute on the PPI (Step 6). To do this, we receive as input the list of essential attributes, the process model and the list of PPIs. The process analyst is responsible for linking the different attributes to the PPIs that we want to predict, according to their relationship. Each attribute can be related with one or more PPIs. Table 3 depicts the points that should be considered by the process analyst. First column states the selected PPI to be predicted. Second column represents the name of the attributes obtained in the penultimate step of the methodology depicted in Figure 1 (i.e. internal, external and process essential attributes). Third column shows the possible values of the attributes (numerical or categorical). Fourth column indicates the type of attribute (internal, process or external). Fifth column reflects the answer to the yes/no question: ''Do different values for this attribute (i.e. changes) have a direct impact on the value of the PPI?'' and finally last column provides the reason to the answer of the previous column. For instance, in an incident resolution process, in which a technician has to go to different places to solve the incident, the physical location in which the incident takes place may have an influence on the indicator we are predicting, such as the resolution time. Therefore, it will be a contextual element useful for the predictive monitoring process. Therefore, if a variation in the value of the attributes has a direct impact on the prediction of the PPI, they will be identified as contextual elements. At this point, we also analyze the granularity (i.e the level of detail considered for each attribute) of context attributes. A finer granularity of attributes usually leads to a more precise reasoning while a coarser granularity leads to a less precise one with the benefit of being less computationally demanding. For example, a process attribute can specify the town were an incident has occurred. Maybe this level of detail is not valuable for the prediction. We can group all the towns of the same region. In this way, we reduce the number of possible values for this attribute and computing cost decreases. The final output of the methodology would be a PPI-attribute matrix with the subset of context attributes derived from Table 3 which have a positive answer in the last column.

B. CAP3: A CONTEXT-AWARE PREDICTIVE MONITORING TECHNIQUE
This section describes our Context-aware Process Performance indicator Prediction technique (CAP3). This method, depicted in Figure 2, includes a first step which represents the identification of context attributes for PPI prediction presented in the previous section as an extended version of ORGANON. The inputs of the activity are the process model and the documentation of PPIs. In this activity, we analyze the impact of context attributes on the PPIs to be predicted. As we have described, process analysts that know the details on how the process behaves, analyze this impact and then decide on the appropriate context. The output of this activity is the PPI-attribute matrix, where we can find the selected PPI to be predicted and the context attributes necessary for the prediction. Then, a second activity, named Preprocess the event log, filters the event log L (formed by the set of traces T ) to remove unnecessary information, enriches the event log with additional information adding context attributes, and transforms some attributes of the event log. The inputs of this activity are the event log of the process, the external attributes and the PPI-attributes matrix where context attributes are found. The output of this activity is the dataset with all the attributes of the event log. As defined in [35], Stage 1 of Figure 2 represents the learning phase. In this stage, the dataset is generally encoded in feature vectors that can be interpreted by the predictive algorithm. One of the different techniques for the encoding of data applied in the literature [7], [36] can be used. As a result of this activity we obtain a set of feature vectors that represents the set of traces T , where each trace is formed by a set of events E and each event is composed by a set of attributes a.
Then, the predictive method is executed and generates a prediction model P as output data, based on the knowledge of the traces T of the event log. This model is evaluated to asses its validity, using the different traces of process instances as a test set, by means of quality metrics. Stage 2 of Figure 2 represents the prediction phase for a typical predictive monitoring method. At runtime, the generated model is applied to ongoing instances in a given moment of the execution. Then, the predictive model will determine the value of the predicted outputs for this process instance, i.e. the result of the function P I ([E i 1 , . . . , E i l ]), that computes a prediction for I from the trace [E i 1 , . . . , E i l ], where E i l is the last event that have occurred in trace T i at a given moment. .

V. EVALUATION
In order to test the validity and applicability of our approach, we applied the proposed methodology for the identification of context information in two real-life organizations. Once we have extracted context information, we apply our predictive monitoring technique and also provide an experimental analysis of the relevance of the definition and inclusion of context attributes for the predictive monitoring of business processes.
The rest of the section is organized as follows: a description of the two real-life organizations is provided in Section V-A. The details of the application of our methodology to identify context are described in Section V-B. Experiment setup for the application of our predictive monitoring technique is defined in Section V-C1 and the application of our predictive monitoring technique is described in Section V-C.

A. SCENARIO ANALYSIS
Two real-life organizations were considered in our experiments: Techmaster (TM) and the IT Department of a Spanish Healthcare Provider (SHP).
TechMaster is a Brazilian company which provides IT infrastructure and management of IT environments. 1 The studied business process models the Techmaster IT incident management. This process stores the information of the management of incidents registered at this company. A solution should be established for each incident in order to restore the service with minimum disruption to the business. After providing a solution to the problem and verifying that the service is restored, the incident is closed. Each incident is represented as a ticket which reflects a process instance, and each ticket is composed by different articles that represent process events. A ticket can have zero or many articles. Each ticket has a Techmaster employee assigned. Each incident is classified with a priority level (1)(2)(3)(4)(5).
The IT Department of the Spanish Healthcare Provider under study (SHP) 2 provides IT services and support to the different health centers associated. The studied business process represents the SHP IT incident management as it was performed between 2014 and 2016. The process is composed by different events from the start to the resolution of the incident. Incidents can occur in any health center or hospital associated to this provider and can be attended by phone, mail or intranet. The incidents are classified into three categories: Hardware, System and Other. Each event of the process has a certain level of priority (low, medium and high). In this scenario, a service level agreement (SLA) is established considering certain PPIs. This SLA determines the penalties derived from the under-fulfillment of a threshold for each of the PPIs. Thus, predictive monitoring is necessary to warn the possibility of violation of the SLA. In this case, three PPIs are considered: K01, which determines if an incident was solved in a longer time than expected (duration time > 17h); K06, which determines if an incident has been reopened because it was not correctly solved; and K20, which indicates an abuse of the stopping time (idle time > 0). Idle time is the unproductive time on the part of employees as a result of factors beyond their control.

B. IDENTIFYING CONTEXT
This section details the application of our methodology described in Section IV-A.

1) EXPERIMENTAL SCENARIO: TECHMASTER
The proposed methodology was applied to the process of resolution of incidents of Techmaster. Firstly, we analyze the process model and determine the PPIs to be predicted ( Figure 1, Step 1). We have determined that the duration of the process and the number of incidents divided by the number of service requests (R) are good candidates to be considered as predicted PPI. This second PPI is an aggregated indicator.
Two experts in the process have participated in the fulfillment of the semi-structured guideline 3 (Figure 1, Step 2). The two process analysts who collaborated with us were: the process manager, responsible for the technical operations, 15 years working in this process and the director of the company, responsible for the relationships with clients, 25 years working in this process. According to the interview, we have found the list of items (EBE) associated with the essential activities. This list consists of the tickets and the reports (i.e. software inventory, hardware inventory and high impact incidents). Later, we have identified a set of essential activities of the process (Figure 1, Subprocess in Step 3) such as Open ticket, Update ticket, Communicate with client, Discussion about the ticket, Build reports, Send reports to the customer and Send invoice (related to the ticket). Finally, according to Figure 1, Steps 3, 4 and 5 are carried out and the context attributes identified after following the guidelines are: the human resource in charge, the technical characteristic of the equipment, maturity level of customers' infrastructure, the remote support and the priority. All these elements and the PPIs which are related with them are shown in Table 4 ( Figure 1, Step 6).

2) EXPERIMENTAL SCENARIO: SHP
First, we have analyzed the model and identify the different PPI for the process (Figure 1, Step 1). In this case, the time duration of an executed instance of the process has been selected as PPI for the prediction. This is due to the fact that several SLAs defined in the previous section for this process are time-related. A SHP process expert has also fulfilled the semi-structured guideline 4 (Figure 1, Step 2). He is responsible of the Department of IT Service Management, with 13 years of experience and a high knowledge of the process since he has been the process owner several years and has been involved in its continuous improvement since the beginning of his work there. According to the interview, we have found the EBE list which is formed by the ticket, the reports and all the information about the system throughout the process, such as interactions or comments. Associated with the EBE list, we have identified as essential activities of the process (Figure 1, subprocess reflected in Step 3): Registration of the incident, Determination of priority, Diagnosis and resolution. Finally, the context attributes identified ( Figure 1, Steps 3, 4 and 5) after following the guidelines are: the priority level and the center related to the incident. This attribute defines the type of the center (health center or hospital) and its location. The context attributes extracted and their related PPIs (Figure 1, Step 6) are shown in Table 5.

C. PREDICTIVE MONITORING EXPERIMENTS 1) EXPERIMENTAL SETUP
This section details the setup of our experiment addressing the steps of our predictive monitoring technique described in Section IV-B.

a: ENCODING
We have selected a typical aggregation encoding described in [36] as one of the most used in literature to encode the process cases. Thus, all events since the beginning of the case are considered. An aggregation function is applied to the values taken by a specific attribute throughout the case lifetime. In our case, this function is the number of times that each specific attribute appears in the case (frequency encoding). We have not divided the cases in the event log into different buckets. This technique is named Zero bucketing as defined in [36]. We have also incorporated the order of the events as a new attribute in all the logs (i.e. the relative position of the event in the case), as well as the elapsed time between the event and the beginning of the case and the time between the previous event and the current one.
To select relevant features from the datasets, tree-based estimators are employed. They can be used to compute impurity-based feature importance, which in turn can be used to discard irrelevant features. In our case feature importances are obtained using ExtraTreesClassifier from Scikit-learn library [37].

b: BUILDING THE MODEL
As predictive algorithm we have used random forest [38] and extreme gradient boosting [39] as seen in previous works in the literature [3], [36]. Random forest (RF) is a combination of predictor trees such that each tree depends on the values of a random vector tested independently and with the same distribution for each of them. Gradient Boosting is based on the combination of weak learners, such as decision trees, to create a strong predictive model. The generation of the weak decision trees is done in a sequential way, each tree being created in such a way that it corrects the errors of the previous tree. One of the parameters of the algorithm is the learning rate, which controls the degree of improvement of a tree with respect to the previous one. We have employed Extreme gradient boosting (XGB) [39] which is a gradient boosting implementation especially noteworthy. In [36], authors highlight XGBoost and RF as two of the best techniques in predictive monitoring. For the implementation, we have used RandomForestRegressor method from Scikit-learn [37] library for machine learning in Python and XGBRegressor method from the Xgboost Python library. We have applied an optimisation technique for the hyper parameters tuning of both algorithms. We have performed a randomized search on hyper parameters using Ran-domizedSearchCV from Scikit-learn library. The parameters are optimized by cross-validated search over parameter settings. The selected RF parameters after the execution of the method are: n_estimators=100, max_features=auto and max_depth=12. For XGB, the selected parameters are: colsample_bytree=1, learning_rate=0.3, max_depth=6, alpha=0 and n_estimators=100. We have split the traces of our dataset in 80% for training and 20% for test to validate the predictive algorithms.

c: EVALUATION
We have used Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) as evaluation measures since we are going to predict a numeric value with regression models and these are evaluation measures commonly used in the literature for this purpose [12], [36]. MAE is a risk metric corresponding to the expected value of the absolute error loss and RMSE is a risk metric corresponding to the expected value of the square root of the squared (quadratic) error.
We also provide an experimental analysis of the relevance of the definition and inclusion of context attributes for the predictive monitoring of business processes. In order to do so effectively, we have obtained six different datasets extracted from the event logs of studied processes, which consider the context information obtained after the application of our methodology, according to the following classification: • All (ALL): all attributes of the original log are considered (except identifier attributes).
• None (NONE): just the basic attributes of the log, such as name of activity and timestamp (date and time when the activity was performed) are considered.
• Automatic (AUTO): attributes detected as relevant for a decision tree algorithms are selected for the experimentation.
• Random (RND): a set of randomly selected attributes are taken into account.
• Context (CTXT): detected context attributes by our method are included.
• Without context (WCTXT): detected context attributes are excluded from the log. The details and the code of the experimentation is available in our repository. 5

2) EXPERIMENTAL SCENARIO: TECHMASTER
The event log of Techmaster process consists of 6, 665 process instances executed throughout the year 2016 and a total number of 14 attributes (id_event, article_subject, id_case, ticket_title, ticket_user, ticket_state, customer_id, article_create_time, ticket_create_time, diff, owner_id, priority, ticket_history_type,a_from). 6 Some statistics of this log can be found in Table 6. According to Table 3, the context attributes that can be exploited are the priority and human resource, since the rest of context attributes are not reflected in the log and there was no way to gather them from the company. Once the encoding of the event log and the context information (case priority and owner_id) is finished, feature vectors are obtained. Then, we construct six datasets according to the definitions of the previous section. 7 After applying the one-hot encoding, the different attributes of the event log are converted into several features; specifically one feature for each different value that can take the attribute. For instance, we have the priority attribute, and after the encoding, it is converted into five different features (i.e the different priority levels): x1_1 very low, x1_2 low, x1_3 normal, x1_4 high and x1_5 very high. For the AUTO dataset, we need the most relevant features. Thus, we have obtained the normalized importance measures of the features applying the selector algorithm cited in Section V-C1. We have ranked the 10 most important features, and we have selected the 5 first features for the AUTO dataset. Table 7 shows the most relevant features. It is noticeable that context features appear in the list in 2nd position (owner_id), 8th and 10th positions (normal and high priority). We also highlight that the other features in the top 10 refer either to time or activity (3rd, 7th and 9th positions). After applying the predictive algorithms (RF and XGB), we asses the validity of the predictive model obtained. Table 8 summarizes the average error of our model for the different TM datasets. Regarding the observed cases, results of MAE and RMSE decrease considering the context attributes obtained by means of our proposed methodology (CTXT achieves the best results for RF and XGB). We can notice that the second best result is reached with AUTO dataset. This suggests that all the attributes are not necessary to obtain a good prediction (ALL dataset achieves the worst scores for RF and XGB) and a selection of the attributes is needed to achieve good 5 https://github.com/alfedu/context_predictive_ monitoring 6 Description of the TM attributes can be found in https://bit.ly/ 2W4fGWV 7 Attributes considered in each TM dataset are available for consultation in https://bit.ly/33DQvNY.  predictive models. RF and XGB performed well in terms of both computational cost and time.

3) EXPERIMENTAL SCENARIO: SHP
The event log of SHP process consists of 257, 278 process instances, each of them with 15 attributes (Author, Node, Resolutor, Descorg, Typology, Subject, Asignee, Priority, AuthorGroup, State, Center, Source, TypeOrg, Resource and ClosingReason 8 ). Some statistics of this log can be found in Table 6. From Table 4, we extract two context attributes (i.e. Priority and Center) from the event log. After the one-hot encoding, we construct the different datasets 9 and we applied the feature selector algorithm for the AUTO dataset. Table 9 shows the most relevant features. We have ranked the 10 most important features again, and we have selected the 5 first features for the AUTO dataset. Two context features appear in this ranking, Priority and Node in 7th and 10th positions respectively. This gives us an idea about the importance of context. However, the most relevant attributes are those related to time. That makes sense because we are predicting a time indicator. As it is shown in Table 10, using RF we have obtained better results of MAE and RMSE for the NONE dataset and considering the context attributes (CTXT) we have achieved the third best result (with similar values to AUTO dataset). In contrast, using XGB, CTXT reaches the lowest error rates. It is worth noting that the worst results are once more obtained using all attributes (ALL dataset) for 8 Description of SHP attributes can be found in https://bit.ly/ 2ObGrnZ 9 Attributes considered in each SHP dataset are available for consultation in https://bit.ly/33E7KP8.   RF and XGB. Thus, it is therefore strongly recommended to apply an attribute selection, since AUTO dataset was ranked at second position for both algorithms. Moreover, we can appreciate higher error rates if we exclude the context attributes from the dataset (WCTXT).
Some general conclusions can be drawn from Figure 3, a representation of the error rates of the different experiments. The figure shows the relevance of the context attributes for the prediction since we have obtained the lowest error rates using CTXT dataset (dark blue bubble). Comparing the results of CTXT dataset with the rest of datasets, we have reached the best error rates in most cases (3/4) as we can see in charts TM-RF, TM-XGB and SHP-XGB in Figure 3 and the third best error rate is achieved for SHP-RF experiment (similar results are obtained with AUTO dataset which reaches the second best rate in this case). In general, ALL, RND and WCTXT datasets obtain the worst error rates in all cases (light blue, yellow and green respectively). That suggests the importance of choosing the adequate attributes for the prediction. On the other hand, AUTO and NONE datasets (orange and grey) obtain better error rates. Two conclusions can be drawn from this fact: 1) It is preferable using only essential attributes (activity and timestamp) than using all attributes in the log for the prediction. Furthermore, we will obtain a considerable reduction in computational complexity. 2) The use of a feature selection method is desirable to improve the prediction results.

VI. CONCLUSION
One of the main research goals in the last 5 years in process mining has been the improvement of predictive monitoring techniques because they play a significant role in supporting not only descriptive process mining (i.e. to understand what happens), but also prescriptive process mining (i.e. to provide operational support at run-time) [35].
These proposals have focused on designing algorithms that help improve the quality of the prediction given either by using a more efficient learning technique or by providing mechanisms to include context information other than timestamps and activity names. However, none of previous work in the literature related to predictive monitoring possesses a methodology that guides the user to identify the relevant context information for the prediction of a process performance indicator. This is relevant because defining the context is not an easy task [20] and, as we have shown in this paper, providing too much irrelevant context information might not always be beneficial for the predictive quality. There are other proposals like ORGANON that focus on identifying context information that have an influence on process goals, but as we have discussed, they need to be adapted to the particularities of process performance indicators.
To address these limitations, in this paper, we propose a methodology for the context-aware prediction of PPIs, named CAP3. This methodology is divided into two main phases. First, we have extended the ORGANON methodology [20] for the discovery of contextual information of a business process necessary for predicting process indicators. Context attributes for predictive monitoring were determined following the methodology which includes interviews with process analysts to eliciting the relevant context for the process at hand. Second, we have proposed a predictive monitoring technique for the PPI prediction which includes the identification of context information to improve the prediction. Our methodology was validated in two real-life organizations: Techmaster, a Brazilian company which provides IT infrastructure and management of IT environments and the IT Department of a Spanish Healthcare Provider. The benefits of the methodology were shown, since the context attributes discovered by our proposal improve the PPI predictions for two different machine learning algorithms (Random forest and XGBoosting) as can be checked in Tables 8 and 10 for the CTXT dataset. Furthermore, we have performed a comparative analysis to determine the influence of the context information in the predictive monitoring of business process indicators using different datasets. According to the obtained results, those datasets that include context attributes (CTXT) reached the best error rates in most cases (3/4). In addition, a feature selection revealed some interesting findings. First, some of the features selected by the algorithm were considered context attributes, giving us an idea about the importance of context for the prediction. Moreover, low error rates were achieved with the attribute selection datasets (AUTO), which proves our proposal to be a good technique for predicting PPIs.
In this context, the application of our CAP3 methodology has several implications for practice and theory. Concerning the former, as shown in the experiments, we have been able to achieve better predictive monitoring models if the context attributes identified in CAP3 are used for training these models. Therefore, our technique can be used in organizations to improve the quality of the predictions that can be applied to take corrective actions in case a deviation with the desired goal is detected. An additional implication is that even if the prediction quality does not improve significantly, the use of meaningful attributes for building the model can lead to more realistic prediction explanations that can be used by experts to understand the behaviour of the business process. This is relevant because, as shown in previous work like [19], not only the quality of the prediction, but also the reliability of the model should be taken into account when assessing predictive monitoring techniques.
From a theoretical perspective, this paper shows how expert knowledge about the process context can be used together with machine learning techniques to improve the performance of predictive monitoring models. This opens a path for using domain knowledge to enhance predictive or prescriptive process monitoring, which is something that has barely been explored by the literature. Finally, the experiments performed in this paper provide more evidence that supports the idea that including all possible attributes do not necessarily lead to better predictive performance. This can be used to adapt feature selection techniques that are widely used in machine learning [18] to predictive monitoring.
Several limitations of our work can be considered. First, we have seen in our empirical study that there are situations in which using contextual attributes identified by CAP3 do not yield the best predictive monitoring performance. Further research would be necessary to characterize the scenarios in which its application brings more benefits. Another limitation is that CAP3 is very useful to make explicit knowledge that is shared by the experts. However, it is more limited in identifying unexpected relationships that could exist in the data. To address this issue, as a future work we plan to include machine learning techniques like clustering methods to support the expert in the context identification process.

VERIFIABILITY
For the sake of the verifiability, all the information for the replication of experiments is available online. For each case study, the documentation for a better understanding of the logs, the interviews with the process experts, the list of selected attributes for each dataset and the code of CAP3 project can be found in our Github repository 10 The TM [40] and SHP [41] datasets can be found in Zenodo repository.
MANUEL RESINAS (Member, IEEE) received the Ph.D. degree (Hons.) in software engineering and technology from the University of Seville, Spain, in 2008. Since 2018, he has been an Associate Professor with the University of Seville. He is also a member of the ISA Research Group, where he also been leads the research line on Business Process Management since 2010. Since 2019, he also ben leads the Information Systems Line of the SCORE Laboratory, University of Sevilla. He worked on automated negotiation of service level agreements. He has also cooperated with several IT companies as a consultant and researcher. His current research interests include process compliance and performance analytics, predictive monitoring, collaboration systems and technologies, and human-resource management.
ADELA DEL-RIO-ORTEGA received the international Ph.D. degree (Hons.) in software engineering and technology from the University of Seville, Spain, in 2012. She is currently an Associate Professor with the University of Seville. Her research interests include business process management and process performance improvement. Further research interests pertain to the modeling of SLAs, robotic process automation, knowledge-intensive processes, and the decision management. She has contributed to more than 20 scientific publications in prestigious journals and conferences and has running collaborations with various international scholars. She developed two registered software tools, which generated an industrial value of more than 60k e. She took part in more than ten R&D&I projects and has cooperated with several IT companies as a consultant and researcher. From 1990 to 1998, he was with the Computer Industry. From 1996 to 1998, he was a Part Time Lecturer with the University of Huelva. In 1998, he joined the University of Sevilla, as a Full Time Lecturer. In 2004, he founded the Applied Software Engineering Group, University of Sevilla. Since 2016, he has been a Full Professor of software and service engineering, and since 2019, he has also been heads the SCORE Laboratory, University of Sevilla. His current research interests include service-oriented computing, business process management, testing, and software product lines.
Dr. Ruiz-Cortés is an elected member of the Academy of Europe and since 2018, he has been the President of the Spanish Society on Software Engineering (SISTEDES). He was a recipient of the Most Influential Paper of SPLC in 2017 and the VAMOS Award in 2020. He is also an Associate Editor of Computing (Springer). VOLUME 8, 2020