ProST-re: Simulation-Based Analytics for Finding Appropriate Substitutes and Task Reallocation

Absence of an organization’s employees often brings negative impacts on an organization’s performance and profitability. To address this problem, various studies have been conducted to find ways to identify an appropriate substitute employee for an absent employee. However, previous studies have not reflected changes in employee work relationships over time. Furthermore, the studies have not considered the effects of task reallocation on an organization due to an absent employee. To address these limitations, this paper proposes an approach to process-oriented substitute-task reallocation when there is an absent employee. This approach includes multiple steps to find process-oriented substitute-task reallocation. First, a cooperation network, a social network that demonstrates employee work relationships, is built using the attributes of the employees obtained from event logs. Then, the node2vec algorithm is used to predict links in the cooperation network to select appropriate candidates for substitute employees. Finally, a simulation model evaluates the probable impact of the preliminary reallocation of the candidates’ tasks on the business-process cycle time and eventually finds process-oriented substitute-task reallocation that takes the shortest time to complete process instances. This research validates the approach by using a real-world event log. The validation result explains that reflecting changes in employee work relationships is critical in selecting substitute employees. By adopting the proposed approach in this research, organizations could maintain the appropriate number of employees and reallocate tasks optimally in a rapidly-changing environment.


I. INTRODUCTION
An organization often encounters problems when its employees are absent because of several reasons, such as maternity or paternity leaves, resignations, and sick leaves [1], [33], [40], [61], [67], [71]. In particular, the absence of many employees due to their positive diagnosis of the COVID-19 pandemic and their consecutive quarantines have brought significant difficulties to almost all organizations in the world. In addition, numbers of organizations have faced multiple challenges arising from unexpectedly increased workloads [25], [27], [36], [38], [46], particularly for organizations that had to lay off their employees. For instance, organizations let go off their employees to reduce payroll expenses during The associate editor coordinating the review of this manuscript and approving it for publication was Barbara Guidi . the COVID-19 pandemic. Those organizations experience difficulties coping with increased workloads as the pandemic has become endemic.
To overcome the aforementioned challenges, organizations should promptly identify and assign suitable substitutes for absent employees [13], [22], [28], [30], [66], [67]. In addition, organizations should determine whether the increased workloads for present employees could be managed and establish optimal task-reallocation plans to mitigate the negative effects of a shortage of employees.
Previous studies have been conducted to find ways to identify appropriate substitute employees to resolve the problems. The studies assume the most appropriate way is to substitute with another employee who performs similar tasks or forms a work relationship with the absent employee. The authors of [33] proposed a systematic approach that extracted employee attributes from event logs using social network analysis and processes to identify appropriate substitutes for absent employees. In [1], the authors derived employee attributes from event logs and used the attributes to define an integer linear programming model. The model aims to select the set of candidate employees to minimize the total substitution cost.
However, these previous studies have not reflected changes in employee work relationships. The employee work relationships are likely to change as changes in the nature of work and how employees work. The changes in work relationships over time can influence how well the team can function, which, in turn, may affect organizational outcomes. The studies also have not considered the availability of substitute employees. The substitute employees are required to perform the reallocated tasks of absent employees besides their primary tasks. The substitute employees have difficulties performing the additionally-reallocated tasks while simultaneously performing their primary tasks. The previous studies excluded the effects of task reallocation on the business-process cycle time due to an absent employee.
To overcome these limitations, this paper proposes an approach to process-oriented substitute-task reallocation (ProST-re) when there is an absent employee. First, a cooperation network, a social network that demonstrates employee work relationships, is built using a ''working-together'' matrix and a ''performer-by-activity'' matrix derived from an event log. Next, the node2vec algorithm and logistic regression are used to predict links in the cooperation network. This research considers changing employee work relationships to select the most appropriate substitutes for absent employees. In addition, a simulation model evaluates the probable impact of preliminary reallocation of the candidates' tasks on the business-process cycle time; this research finds ProST-re. This research defines ProST-re as the final executive reallocation that takes the shortest time to complete the process instances. Finally, the candidate who attains ProST-re is designated as the final substitute employee.
This research facilitates organizations to identify appropriate substitute employees and reallocate tasks optimally to respond promptly to a sudden absence of employees. Furthermore, by adopting the proposed approach in this research, organizations could maintain the appropriate number of employees and reallocate tasks optimally in a rapidly-changing environment.
The remainder of this paper is organized as follows. Section 2 describes the existing literature related to this research. Section 3 introduces an approach to ProST-re using link prediction and a simulation model. Section 4 describes the validation result of the approach. Section 5 presents the conclusions.

A. METHODS TO SELECT SUBSTITUTES FOR ABSENT EMPLOYEEs
Absenteeism is one of the most critical factors that affect an organization's performance and profitability [19], [43], [49], [54], [56], [70]. Traditionally, absenteeism is viewed as a problem related to human resource management (HRM) [20], [48], with absenteeism management being one of the main strategic HRM measures that organizations undertake to ensure their profitability and success [37], [57].
Absenteeism is an employee's absence, particularly for regular work [24], [53]. Employees may be absent from work for various reasons, not all of which can be controlled, such as unexpected illnesses [6], [7], [12], emergencies, and accidents. From an HRM perspective, absenteeism can be defined as the percentage of absent days [47]. To maintain normal operations, HRM must respond appropriately to an employee's absence [53]; one of the methods to cope with absenteeism is to have potential substitute employees.
Most firms rely heavily on managers' subjective evaluations and subordinates' knowledge to find appropriate substitute employees. They often identify suitable substitutes by considering the formal organizational hierarchy or by superficial observations. However, firms tend to ignore informal organizational structures, such as networks that employees build across functions and departments to complete assigned tasks. These informal organizational structures identify the actual work of individuals in the organization and represent work relationships that link and divide the individuals and the patterns and implications of those relationships. These relationships are not aligned with the formal organizational structure.
A few studies have been conducted to find ways to identify appropriate substitute employees using social networks [1], [33]. The studies assume the most appropriate way is to substitute with another employee who performs similar tasks or forms a work relationship with an absent employee. However, these previous studies have not considered that employee work relationships could change over time. In particular, due to the COVID-19 pandemic, working from home expands employee work relationships beyond the existing departments or teams. These changes should be included when selecting substitute employees for absent employees.

B. BUSINESS PROCESS SIMULATION FOR EVALUATING TASK REALLOCATION
Business process optimization uses pre-specified performance goals to improve business processes. Continuous improvement of business processes leads to market-share maintenance, cost savings, time savings, quality improvement, and competitive advantages [29], [58], [59]. The importance of business process optimization relies on the ability to (re)design the processes in response to quantitative evaluation criteria. Business process simulation applies profitability metrics to test methods of evaluating a system's optimality [17], [45], [50], [60], [64]. The methods are used in several business fields and contribute to the continuous improvement in business process management [52].
The last optimization approach is to rearrange tasks among employees, reallocating a task from one resource to another; for this, two mechanisms have been used. One is a workload balancing mechanism that addresses the limitations of resource capacity [14], [15], [62]. The second is a task reallocation mechanism that reduces the business-process cycle time [9], [63], [69]. However, previous studies have not considered the availability of substitute employees who perform the additionally-reallocated tasks while simultaneously performing their primary tasks. In particular, the studies excluded the effects of task reallocation on the businessprocess cycle time due to absent employees.
To address these limitations, this research proposes an approach for selecting substitute candidates for absent employees using process mining and link prediction. The approach uses a simulation model that the probable impact of preliminary reallocation of the candidates' tasks on the business-process cycle time; this research finds ProST-re.

III. APPROACH TO PROCESS-ORIENTED SUBSTITUTE-TASK REALLOCATION
This section presents the approach for finding ProST-re that mitigates the negative effect of absent employees. The approach consists of three essential procedures (Fig. 1). First, (Fig. 1(A)), a cooperation network is built; it represents employee work relationships by extracting the employees' attributes from event logs. Then, (Fig. 1(B)), link prediction and a similarity measure are used to select substitute candidates in the cooperation network. Lastly, ( Fig. 1(C)), a simulation model evaluates the candidates and analyzes the probable impacts of preliminary task reallocation on the business-process cycle time. Among the candidates, the approach identifies ProST-re, the actual and executive reallocation, which takes the shortest time to complete process instances.
The followings are assumed when selecting substitutes for absent employees: (1) Employees with experience performing an absent employee's primary tasks are selected as candidates only. (2) Among the candidates, employees with work relationships with absent employees are likely to be selected as the final substitute. (3) Among the narrowed candidates in (2), employees who are most experienced with the absent employees' primary tasks are likely to be selected as the final substitute.
The core limitation of previous studies is that they only considered substitute employees who worked with absent employees. However, the approach in this research includes substitute employees who are likely to build future work relationships with the absent employees, although they previously did not have any work relationships with absent employees. Two elements are considered to determine substitute employees who are likely to have work relationships with absent employees. The first is their co-workers. Employees who share more mutual co-workers with absent employees are more likely to be selected as candidates. The second is their structural roles (e.g., subordinates or managerial roles). Employees with similar structural roles to absent employees are more likely to be selected as candidates. Therefore, even when there is no direct work relationship between the probable candidates and the absent employee, they may be connected and considered substitutes for each other. Thus, VOLUME 10, 2022 the elements are critical for overcoming the limitations of previous studies.

A. COOPERATION NETWORK
A cooperation network is a social network that demonstrates employee work relationships. The network is built using the employees' attributes, as extracted from event logs. Information systems record event logs that can be used to track and monitor all events in the business process.
This research assumes that event logs include attributes such as case IDs, activity names, resource names, and timestamps. Let A be a set of activities (also referred to as tasks), P be a set of performers (also referred to as employees), and T be a set of timestamps recorded when a performer starts to execute an activity. E = A×R×T is the set of possible events, i.e., combinations of activity, performer, and timestamp. C = E * is the set of possible event sequences (i.e., traces that describe a case). B(C) is the set of all bags (multi-sets) over C. Then, L ∈ B(C) is an event log. For convenience, two operations (i.e., f a and f p ) are defined on events: f a (e) = a and f p (e) = p for some event e = (a, p, t).
The proposed approach in this research derives a ''working-together'' matrix and a ''performer-by-activity'' matrix from event logs. A ''working-together'' matrix records how often two performers execute activities within the same case. This matrix does not include a situation where the same performers execute activities multiple times within the same case because performers cannot be substituted by themselves. For a given L, x, y ∈ P, and c = (c 1 , c 2 , . . . ) ∈ L, x∇ c y ((1)) denotes the number of times two performers (e.g., x and y) work together within a case (e.g., c).
WT xy , a ''working-together'' metric, is the sum of x∇ c y of all cases in L divided by the total number of cases. As in Eq. (2), WT xy is defined as follows.
WT xy = c∈L x∇ c y c∈L 1 A working-together metric can be represented as an N × N ''working-together'' matrix, where N is the number of performers (i.e., performers that executed at least one activity in L). Cell (x, y) of the ''working-together'' matrix (Table 1) indicates the number of times x works with y in L. For example, performer P1 works with P2 six times.
The second is a ''performer-by-activity'' matrix. It records the number of times each performer executes specific activities. For a given L and x ∈ P, z ∈ A, c = (c 1 , c 2 , . . . ) ∈ L, PA xz , a ''performer-by-activity'' metric, denotes the number of times x executes z in L. As in Eq. (2), the ''performer-byactivity'' metric is defined as follows.   A ''performer-by-activity'' metric can be represented as a ''performer-by-activity'' matrix with dimensions N × M , where N is the number of performers (i.e., performers that executed at least one activity in the event log), and M is the number of activities. Cell (x, z) of the ''performer-byactivity'' matrix (Table 2) represents the number of times performer x executes activity z. For example, P1 performs Task D 3 times.
A cooperation network is a social network that demonstrates employee work relationships. It is built by using two matrices: the ''working-together'' matrix and the ''performerby-activity'' matrix. The cooperation network G is an undirected graph that is represented as a 2-tuple G = (V , E), where v i ∈ V denotes a node that represents employees, and e ij = (v i , v j ) ∈ E indicates an edge connecting v i and v j if they work together within the same case; v i has a node attribute x v i , which is a vector corresponding to row v i of the ''performerby-activity'' matrix. e ij has its weight corresponding to the value of row i and column j of the ''working-together'' matrix. An example of the cooperation network (Fig. 2) is built using the two matrices (Table 1 and Table 2).

B. METHOD TO FIND SUBSTITUTE CANDIDATES FOR ABSENT EMPLOYEES USING LINK PREDICTION
The cooperation network derived from event logs is used to find substitute candidates for absent employees. The method ( Fig. 3) to select the substitute candidates entails five steps. The final output of the method is the substitute candidates and their substitution scores (SSs) that measure the degree of substitution.
First, the method uses the node2vec algorithm and logistic regression to predict links in the cooperation network. Edges are added to the cooperation network (Fig. 3, two broken lines), so the cooperation network is updated.
Link prediction generally refers to the problem of using node embeddings to predict whether a link exists between two nodes [23]. For this purpose, the method applies the node2vec algorithm [5] to the cooperation network. This algorithm determines node similarities by taking random walks on the input graph and generating node embeddings from the random walks by using the Skip-Gram model [65].
node2vec effectively finds future employee work relationships by searching the cooperation network's global and local structures. It obeys two principles to embed nodes: homophily and structural equivalence. Homophily is defined as the nodes that belong to the same network community. Structural equivalence refers to the extent to which two nodes are connected to the same nodes, i.e., they share the same neighborhood while not necessarily being directly connected. node2vec is an appropriate algorithm to include the other elements in selecting substitute employees: co-workers and structural roles.
After obtaining embeddings from the node2vec algorithm, the method extracts positive and negative edge samples from the cooperation network. Positive samples (labeled ''1'') are nodes that are connected by an edge; negative samples (''0'') are nodes that are not connected. Then, the method computes edge embeddings for the positive and negative edge samples by applying a binary operator (e.g., Hadamard, average, etc.) on the embeddings of the source and target nodes of each sampled edge. Edge embeddings have equal dimensionality to the input node embeddings. Then, edge embeddings for positive and negative samples are used to predict edges in a network. Given the edge embeddings for positive and negative samples, the method trains a logistic regression classifier to predict a binary value that indicates whether or not an edge between two nodes should exist. The method evaluates the performance of the classifier for each of the operators and select the best classifier. The best classifier indicates the existence of edges when edge embeddings of any two unconnected nodes are given.
Second, extract nodes that connect with the absent node representing an absent employee (assumption (2)). For example, if P6 is an absent employee, the nodes connected to P6 are P1, P5, and P7. Third, extract attributes (i.e., ''performerby-activity'' matrix) of the absent node and its connected nodes. Fourth, remove nodes that have not performed tasks that the absent node has performed (assumption (1)). For example, node P7 is removed from the candidates because P7 has not performed task A. The remaining connected nodes are substitute candidates. Fifth, calculate the SSs between absent employees and substitute candidates (assumption (3)).
The SS (4) extends the modified Jaccard coefficient [48] to represent assumption (3). Let p = {p 1 , p 2 , . . . ,p n } and q = {q 1 , q 2 , . . . ,q n } respectively denote row p and q of the ''performer-by-activity'' matrix, where p is an employee to be substituted, and q is an employee to substitute p. Let A i denote how well q can perform p's tasks compared to p and B i be the number of tasks p can perform. q's SS relative to p is defined as follows.
Finally, substitute candidates are arranged in descending order according to their SSs. The candidate with the highest SS is the most suitable substitute for an absent employee.

C. SIMULATION MODEL FOR FINDING PROCESS-ORIENTED SUBSTITUTE-TASK REALLOCATION
This section presents a simulation model that evaluates the impact of substitute-task reallocation on the business-process cycle time. Experimenting with reallocating substitute tasks to all employees is nearly impossible. This research reallocates substitute tasks to the substitute candidates from the method presented in the previous subsection to identify ProST-re efficiently.
A data model is introduced to simulate substitute-task reallocation to the candidates. The data model contains four entities (Table 3) The ''Business process'' represents business process models that contain the types of tasks in the process (e.g., AND-join, XOR-join), the control-flow information, and the generation time required for the process to generate a new instance.
The ''Performer'' represents the information on task execution capabilities and the task execution time of each performer. It shows the tasks each performer can complete and the time required to perform them.
The ''Performer-Substitute relationship'' represents the information on the relationships between absent performers and substitute candidates.
Finally, the ''Substitute employee'' represents the substitute employees' information on task execution capabilities, the time required to perform the tasks, and the availability that determines whether they are currently available. Availability means the actual availability, whether the substitute employee could perform the absent employee' primary tasks besides his original task. Availability is inversely proportional to the number of tasks that substitute employees could perform. For example, if substitute employees can perform three tasks, their availability is 33.33%.
Although task reallocation simulation procedures exist in the previous research, those models do not apply to this research. Thus, this research proposes a new task reallocation simulation procedure. An example of the simulation procedure is shown in Fig. 4.
Process_1 model described in the ''Business process'' generates a new process instance by considering the process generation time ((1)). P1 is chosen among the performers who can perform Task A ((2)), and the execution time defined in the ''Performer'' is taken to complete Task A ((3)). Then, Task B is assigned to P2 after P1 performs Task A ((4)), but P2 is absent and cannot perform Task B. As defined in the ''Performer-Substitute relationship,'' P12 is a substitute employee for P2 ( (5)). If P12 becomes available, P12 performs Task B for the duration of the execution time defined in the ''Substitute employee'' ((6)). With the data model and procedure, the architecture of a substitute-task reallocation system is depicted in Fig. 5. The four inputs are obtained from users and implemented in the system. Process instances and their tasks are created according to process generation time periodically. Performers with the capability to execute specific tasks are allocated to the tasks. However, if the performer with allocated tasks is absent, its substitute is assigned as an alternative and executes the tasks. The log files are extracted and saved in .csv format after all tasks are completed.
The most appropriate task reallocation scheme can be derived and tailored for an organization's specific situation and objectives. This research assumes efficiency as an essential asset for an organization's performance and profitability. Thus, this research chooses the mechanism that reduces the total cycle time of the business process. Therefore, the simulation finds the reallocation that takes the shortest time to complete process instances, ProST-re. The candidate who attains ProST-re is designated as the final substitute for an absent employee.
This research utilizes the simulation results (i.e., the total cycle time) in two ways. The first is an average working time which means the average time an employee spends working. The average working time is the total cycle time divided by the number of employees. The average working time could be a standard in determining whether additionally reallocated tasks for present employees due to absent employees are manageable and executive. The second is an average business-process cycle time which means the average amount of time it takes to complete a specific process instance from start to finish. The average business-process cycle time is the total cycle time divided by the number of completed instances. The average business-process cycle time could be used in determining whether organizations could meet the process deadline when there are absent employees.
This research facilitates organizations to rapidly identify appropriate substitute employees and optimally reallocate tasks. Furthermore, by adopting the proposed approach in this research, organizations could determine the optimal number of employees and task reallocation in a rapidly changing environment. For instance, when an organization needs to reduce its expenses under the circumstances, it could find the maximum number of employees to let go without encountering sudden workload increases beyond the amount for the remaining employees to complete. Therefore, organizations could resolve the problems arising from unexpected workload increases.

IV. VALIDATION OF PROCESS-ORIENTED SUBSTITUTE-TASK REALLOCATION APPROACH
To assess the proposed approach's effectiveness, this research uses the Business Process Intelligence (BPI) Challenge 2019 event log, a real-world log. The event log is collected from a large multinational company operating from The Netherlands in the area of coatings and paints. The event log shows the payment process after purchasing, a critical part of a company's procurement. This research chose this log because the payment process exposes a company to numerous business risks, such as extended delivery times, reduced production efficiency, and increased costs, but the most critical risks arise from an absence of employees [39]. In addition, employee work relationships, particularly with suppliers, frequently occur in purchase-to-pay process.
There are four purchase-to-pay process types according to matching of invoices before payment: (1) three-way matching of invoices after goods receipt, (2) three-way matching of invoices before goods receipt, (3) two-way matching,  and (4) consignment. Each purchase-to-pay process has its workflows, and they are different. In this research, (4) consignment is used. The consignment process does not contain information about invoices, unlike other processes. This research chooses the consignment process because the invoices-related activities are handled mainly by computers, not human resources. After filtering out incomplete cases and infrequently occurring cases, the remaining log for the consignment process (Table 4)   The ''event concept:name'' is the name of the activity that the event relates to, and the ''event time:timestamp'' represents the time-point at which the activity began.
The cooperation network (Fig. 6) is built using the two matrices. The squared area of Fig. 6 is enlarged and depicted in detail on the right. The detailed figure on the right shows five employees and their attributes, representing the number   of times each employee executes six activities. In addition, the detailed figure illustrates employee work relationships. For example, user_068 and user_035 worked together 13 times, and user_078 and user_057 worked together eight times.
This research applies the node2vec algorithm to predict links in the cooperation network. As an experiment to demonstrate the effectiveness of node2vec, this research compares node2vec to DeepWalk [16]. DeepWalk is the first method to determine node similarities by performing random walks on the input graph. In addition, this research evaluates node2vec against some popular heuristic approaches as baselines. The heuristic approaches include common neighbors [42], Jaccard's coefficient [51], Adamic-Adar [41], and preferential attachment [10]. The hyperparameters for DeepWalk and node2vec algorithms are tuned ( Table 7).
The different binary operations (Table 8) [5] are used to learn edge features for DeepWalk and node2vec. The link prediction results (Table 9) indicate that the learned feature representations (i.e., Deepwalk and node2vec) for node pairs produce significantly better scores than the heuristic approaches. Additionally, node2vec achieves higher area under the receiver operating characteristic curve (AUROC) scores of the binary operators than DeepWalk. Overall, the Weighted-L2 operator, when used with node2vec, is highly stable and produces the best AUROC score.
As an experiment to demonstrate the effectiveness of the proposed approach, this research conducts simulations in three cases. The first case is when there are no absent employees, and thus no substitute employees are needed. The second case is when there are three absent employees, and thus substitute employees are obtained by the proposed approach in the previous studies. The third case is when three employees are absent in the aforementioned cases, and substitute employees are obtained by the proposed approach in this research. The final substitute employees obtained by the proposed approach in this research are (1) user_121 for user_178; (2) user_085 for user 171; and (3) user_088 for user_087. These final substitute employees are not the employees with the highest substitution score. This means that substitute candidates with the highest substitution score are not necessarily selected as the final substitute employees. This implies that the effects of task reallocation on the business-process cycle time and the availability of employees may change the result in selecting the final substitute employees.
Simulations are conducted 1,000 times to ensure reliable evaluations. The values in the table (Table 10) represent the average total cycle time of the simulation results. The results show that the proposed approach in this research is more effective than the approach that previous studies have proposed. The proposed approach in this research increases average cycle time by only 3%. However, compared with the previous approach, the previous approach increases the average cycle time by 26%. This result elaborates that less time is required to complete process instances when substitute employees are selected considering changing employee work relationships instead of only considering the current employee work relationships.

V. CONCLUSION
An organization encounters significant problems when its employees are absent and workloads increase due to the shortage of employees. Finding appropriate substitute employees and optimally reallocating their tasks is imperative to resolve the problems.
This research proposes an approach for finding ProST-re with a focus on changes in employee work relationships. The approach constructs the cooperation network and applies link prediction to the network to select substitute candidates for absent employees. After obtaining the candidates, the approach simulates substitute-task reallocation to the candidates and then finds the final substitute employee. The validation results demonstrate that considering changes in employee work relationships is critical in selecting substitutes for absent employees.
This research, however, has two limitations. The first is the period of time spent collecting the data for validation. The amount of data may be insufficient for predicting future employee work relationships because the data collection period is a year. The second is lacking information on employees' individual characteristics, such as their skills and abilities. Although individual characteristics influence the standard in selecting substitute employees, obtaining this information is onerous due to privacy concerns. If this is resolved and more precise information could be incorporated, then more explicit substitute employees could be assigned.
This research facilitates organizations to respond promptly to the sudden absence of employees. Furthermore, the proposed approach in this research provides insights for organizations to maintain the optimal number of employees and reallocate tasks in a rapidly-changing environment. Therefore, this research could resolve problems arising from unexpected workload increases.