Measure of Semantic Likeness Among Business Process Activities in a Telecommunication Company

The study proposes a measure of semantic similarity to solve the wide variety of business process recovery problems, these measures allow to compare the processes of business models useful for reuse, redesign or reengineering process. A metric that demonstrates the similarity between a reference model or fragment of it is proposed. These metrics focus on the activities that make up a business process and are developed through three measures: syntactic, linguistic and semantic similarities. The results show that the proposed measure is suitable for reuse purposes in manufacturing processes, being a useful tool in the area of knowledge management and process administration.

The measures of semantic similarity were established taking into account the ontology eTom, through the application of the algorithms Syntactic Similarity (Sim Sin ), linguistic similarity (Sim Lin ) and semantic similarity (Sim Sem ).
We propose four application phases necessary for the operation of the model: (1) business recovery, (2) process evaluation according to Wordnet activities, (3) evaluation of processes according to ontology activities and (4) measures of similarity between business models. The model fulfills to deliver a measure of semantic similarity, based on a specific ontology, which delivers results in all cases of comparison, improving the discovery of similarity.

B. DECLARATION OF THE PROBLEM
To improve competitiveness in the various industrial markets, firms invest heavily in the creation of new products and services, with the aim of increasing the number of customers and others to secure their current customers [14].
According to [15] in a telecommunications company, each activity is based on one or more processes, which in turn are broke down onto management, maintenance or support tasks. For [16] it is customary for these companies to resume certain tasks looking to optimize resources. This reuse may present limitations in terms of which task to choose and how long it takes to select it, thus compromising the intervention of a specialist technician who carries out the management of the reuse. To automate this process, a mechanism to determine the semantic similarity between the activities that shape the processes of a business is proposed, focusing on the identification and functionality of the activity.
As a real application it is considered an organization of the telecommunications area that organizes its tasks through business process models, the company constantly renews its processes according to what the market is demanding, as a consequence of this, the need arises to have some sort of automatic recovery system processes. [17].
Throughout this study, it was possible to evaluate various parameters aimed at measuring the similarity that may exist among business process models. These comparison mechanisms use a variety of resources to ensure the correct evaluation of each process. According to studies carried out by [18] the type of similarity measures were divided according to the metric applied. The parameters reviewed in the literature are the following: similarity between the nodes and edges of a business process, editing distance between graphics, casual dependencies between activities and approaches based on a set of tracks.

C. SIMILARITY BETWEEN NODES AND EDGES OF A BUSINESS PROCESS
In this approach, the main measures proposed between the elements of a business process, which is divided into four main measures will be analyzed: (1) syntactic, linguistic and structural similarity of labels, (2) similarity estimation based on names of common activities, (4) similarity by matching labels.
According to [13] respect to the measure (1) syntactic, linguistic and structural similarity of labels, measuring similarity in processes is based on transforming models into ontological representation, certain activities that make up the models compared are defined, where their labels will be compared by three measures of similarity.
However, the study carried out by [19] the problem of searching for a group of processes similar to consultation models, is addressed, it is defined as inefficient to compare each model with a query model. For this, defining an index to find models with characteristics that have many aspects in common with the reference or consulted model is proposed. To define this index, the activity label is evaluated, together with the position in which the node has within the graphical structure what is stated as (2) similarity estimation based on characteristics. In the same line of investigation of [19], we find the work of [20] who points out that in relation to score based on names of common activities, (3) the measure of similarity is based only on processes with an equal number of activities, that is, the number of activities occurring in both comparative processes. After this publication [12] defines several measures, the first one defines the (4) similarity according to node labeling, the measurement of the distance proposed by [21] is used to establish this measure.

D. EDITING DISTANCE BETWEEN GRAPHICS
Since all business process models can be represented graphically, some metric proposals apply graph algorithms in the calculation of similarity. In this approach five measurements will be analyzed: (1) tree editing distance [22], (2) graphic editing distance with high level operations [23], (3) graphic editing distance similarity [24], (4) combination of activity comparison and editing distance graphic [25] and (5) combine the label editing distance and the graphic editing distance [26].
Authors like [22] propose transforming a business process model to a tree diagram (1), where all activities would be tree edges or leaves, and each division or gate in the process would be the nodes that would divide the branches of the graph, after performing this transformation they used algorithms of comparison of tree diagrams. This approach has limitations as it does not resolve the loss of information.
Authors such as [23] propose a methodology for calculating the similarity between business process models based on what are called hit level exchange transactions (2). These authors identify high level change operations to structural changes that can be made to a model, such as the insertion of an activity into existing activities, taking off an activity or changing the position of activities by altering the logical order of the process.
Research proposed by [24] attempts to capture the structural similarity according to the graph of a business model (3), to obtain this, a study of nodes in which borders are deleted, inserted and edited to get from the query to the comparative graph, is made. To know which nodes should be modified Levenshtein is used [21]; with this methodology the distance in measurements between the labels of the nodes, is known.
Authors such as [25] discuss the comparison of the business process models by integrating two measures explained above, the first measure includes obtaining a similarity between the nodes comparing the label of the activities and the connectors that the business process may have, and then to use the methodology of the graphic distance proposed in [24], then performs a combination of both measures to obtain a single measure of similarity (4).
Like the previous measure, this approach presents a combination of two measures and it is proposed by [26]. In order to compare the labels of the activities, this measure of similarity is used to calculate [21]. Subsequently, the editing graphic distance proposed by [24] is used. In addition to presenting the measure of similarity, an idea to portray the content of a business process model is described, by indexing them in the repository model, achieving improved agility in the search for similar models.

E. CASUAL DEPENDENCIES BETWEEN ACTIVITIES
These measures of similarity between process models consider the dependence that exists between the activities, such as the logical order determined by the processes. Table 1 mentions the main measures proposed by some authors, whose approach deviates from the main idea of the comparison mechanism proposed in this study.

F. APPROACHES BASED ON THE SET FOOTPRINT
These measures present an approach in which their main element of study are traces or tracks in a business process model, it is the logical sequence that exists between each activity and event to form the process. Given this definition, this approach leaves the domain study of this research, but as the previous measures, it does not form a theoretical basis for this proposal, and only the main metric are mentioned in the current bibliography.

II. SIMILARITY MEASURES
According to the study carried out in the bibliographic analysis of measures of similarity between business process models, the approach of the proposed mechanism is aimed at comparing activities in a business process, therefore, this investigation uses measures that are fixed on nodes and edges of a process graph, specifically, the search for similarity between activities will be used by the mechanism as its main element. This is done by means of measures comparing the labels of the activities containing the processes [12], [13], [27].
In this section, the measures and resources that will be part of the proposed comparison mechanism are explained in detail, thus, these mechanisms will be three measures to compare the names of the activities (Sim Sin , Sim Lin and Sim Sem ).

A. SYNTACTIC SIMILARITY MEASURE
The measure of syntactic similarity is a type of measure of similarity between words; the syntactic concept corresponds to the analysis of the relationship between the different symbols or signs of language [35]. Considering this definition, the measure is evaluated according to the letters that make up the words being compared. This measure of syntactic similarity is part of the search for the measure of comparison, between the words that make up the labeling of the activities present in a business process model. When measuring two concepts, in this case labels of activities of a business process model, the syntax similarity has a metric in which the degree of similarity is measured according to how much a word should be edited (String), which is called distance. One of the most common and popular measures is the String-Edit distance method, which proposes that, between two words or labels, the similarity is determined by the number of changes (addition, deletion or character replacement) you need, in order to convert one string into another, the more changes you need to transform one string into another, the less similar they are.
To obtain the similarity value, we consider the relationship between the changes necessary to arrive from one concept to another, with the maximum of characters of each string. We define for this the function (Sim Sin ) proposed by [21] (equation 1).
where A n is the concept (string), n:{1,2}, L (A n ) is the length of the concept and ed is the amount or quantity of editions.

B. LANGUAGE SIMILARITY MEASURES
Using only the syntax similarity method to compare concepts is insufficient, since the activities of the processes can be described with completely different words and mean the same procedure, this due to the existence of synonyms, antagonisms or other lexical problems concerning language. Under this clarification is that it becomes essential to have a measure that solves this lexical problem, and that besides evaluating the concept, compare the sense that the activity proposes. According to [36] a measure of linguistic similarity seeks the similarity between concepts according to the meaning that defines them, the linguistic similarity is a type of semantic similarity, since its comparison is projected on a semantic space, which in the linguistic case is a lexical dictionary, and for comparison a specific comparison algorithm is used. It is then defined in this research Sim Lin (equation 2), semantic as a base in the WordNet dictionary and using WordNet:Similarity: where A 1 , A 2 are the concepts to compare, deph of the nodes and lcs are the nearest common ancestors.

C. MEASUREMENT OF SEMANTIC SIMILARITY
Measures of semantic similarity refer to the proximity or similarity between objects or concepts of the same ontology, as explained above the more similar the objects are, the less distance there is between them. A measure of semantic similarity takes as input two concepts, in this case the activities of a business model, returning a numerical valuation that quantifies how similar they are. This measure relies on the proposal of [27], used for measures between tasks in business processes. This type of proposal can be occupied in any type of ontology of domain. Based on the measure of semantic similarity proposed in [37], the authors [27] use a hierarchical ontology of business processes to seek similarity between them; it proposes to divide these proceedings into two phases (classification and similarity).

1) PROPOSED MEASURE OF SIMILARITY BETWEEN THE ACTIVITIES OF THE BUSINESS PROCESS
The proposal is described in accordance with the algorithms explained above Sin Sin , Sim Lin and Sim Sem . The proposal is divided into four main phases

D. RECOVERING THE ACTIVITIES OF A BUSINESS PROCESS
Regardless of the notation they have, every business process is made up of different elements; the activities are the main ones. Within the proposal of this research, in the first part all the labels or names of the models compared are abstracted and then used in the measurements of comparison. To program any proposal, it is useful to convert any model to a business process chart, and extract the value of the function λ.

E. EVALUATION OF THE PROCESSES ACCORDING TO WORDNET ACTIVITIES
By obtaining all the activities of the business process models compared, pairs are formed between the activities of the reference model with the model consulted Each of these pairs is first evaluated by Sim Sin , providing similarity results; of these values only 1 is used. In the case of pairs of activities delivering 0 < Sim Sin <1 values should be evaluated by Sim Lin . When proposals are programmed in some computer language, it is necessary to first evaluate the measure of syntactic similarity; thus, saving time and resources in programming.
The similarity results between the pairs will be arranged in a similarity matrix, where the activities of the reference model will be the rows and the activities of the comparative model will be the columns. An analysis of this matrix allows to determine the maximum similarities between the pairs, in such a way a matrix (S) of dimension (m, n) is defined, (being m activities of the reference model and n activities of the comparative model), consisting of all measures either through Sim Sin or Sim Lin , in each activity.
After obtaining the similarity matrix, a suitable matching algorithm is chosen to be used in this type of matrix. In the studies carried out by [24] and [27] they use a heuristic algorithm, called the sweet tooth algorithm which is focused on obtaining the optimal option in each comparison made. Equation 3 is used to determine the similarity between the models: where S i,j are the maximum values in each row of the similarity matrix, obtaining the optimal similarities and m is the number of optimal pairs compared by activity of the reference model, taking into account the dimensions of matrix S.

F. EVALUATION OF THE PROCESSES ACCORDING TO ACTIVITIES WITH ONTOLOGY
Like the previous phase, the activities present in the model are used to perform the comparison, in this case pairs are formed between the activities of the reference model with the model consulted Sim Sem applies to this pair of concepts. An identical matrix is made to the previous phase, but with similarities in accordance with the designed ontology. In addition, the gummy algorithm is again used to determine the optimal and the same equation to obtain the similarity between the activities of the models according to experimental ontology Sim Fase3 .

G. SIMILARITIES BETWEEN BUSINESS PROCESS MODELS
Once the similarity measures are completed in the previous phases, it is necessary to combine them, for this, both measures will be taken to a weighted average, where each weight (ω 1 , ω 2 ) can be given by the user of the measure of similarity. Equation 4 represents the global similarity.
For experimental purposes, this paper evaluates the similarity between business model process activities, as a simple average, that is, each weighting will be equal to 0,5.

III. EXPERIMENTATION AND RESULTS
Business models provided by the telecommunications company are used for comparison of activities. In total ten models were used (Figures 6 to 15 attached at appendix), three of them are reference models (1,4,7); each one was evaluated with the total set of models, obtaining a similarity ranking for each evaluated process. Process repository models are: model 1 -recruitment of operational staff; model 2 -recruitment of administrative staff; model 3 -recruitment process; model 4 -resource request process; model 5 -resource entry process; model 6 -customer invoicing process; model 7tender process; model 8 -work strategy process; model 9initial production process and model 10 -model of production of electrical cabinet.
The results of the experiment with model 1 and its comparative graph can be seen in Table 2 and Fig.1.  The three models (1,4,7) evaluated, showed semantic similarities of 100% compared to the other models, the proposed measure is suitable for the purposes of reuse in manufacturing processes.
The results of the experiment with model 4 and its comparative graph can be seen in Table 3 and Fig. 2. The results of the experiment with model 7 and its comparative graph can be seen in Table 4

IV. VALIDATION OF THE MODEL
For the validation of the proposal, the results obtained were compared with the published research [12], [13]. Two examples of comparison between business process models are used: (1) the first comes from the study of [12] in which three similarity values of different measures are obtained, (2) the second comes from the investigation of [13] with a single similarity value. Figure 4 and Figure 5 show the two examples respectively. The results obtained in each comparison and error range are presented in Table 5.  In the case Ehrig et al. [13] its proposal of measure of similarity considers the attributes and structure of the process. Moreover Dijkman et al. [12] occupies a type of measure of similarity using the editing of graphics, these authors have their main focus on the structure of the process, the activities of the models is the focus of our research. The results show in all cases a difference of 50%; the results obtained by the authors only consider the structure of the graph.

V. CONCLUSION
The measure of semantic similarity between business process activities exposed in this work, adequately captures the similarity that may exist between two business models, according to its reusability in some companies. In fact, the results obtained in the experimentation based on real models of a telecommunications company, allow to obtain an implicit knowledge coming from the experience of the company, or an explicit knowledge that is easily used for process improvement or reengineering.
The results show that measurements of syntactic similarity do not guarantee the correct discovery of similarity, since they omit components as important as the definition that determines the activity. However, the use of this measure reduces the execution times of the process in the mechanism used; therefore it is only used when its value determines an identical similarity; in this case it has no margin of error.
The use of a measure of linguistic similarity by a lexical dictionary, in this case Wordnet, corrects the inaccuracy of a measure of syntax similarity when it is not identical. The linguistic measure obtains values using the definition of the labels of the activities, considering then the similarity that can exist between synonyms or homonyms present in the tasks of the business models. The problem of using only this measure is when the activities are very specific to the area, example, in case of using technicalities of the item or concepts that are not present in the taxonomy of the thesaurus, the dictionary is not able to deliver accurate values.
The hybrid model proposes a metric that demonstrates the similarity with a reference model. This measure corresponds to the activities of a business process and is developed through three measures: syntactic, linguistic and semantic similarities. The technique was validated by comparing performance with previous research. The validation of the technique allows its application in business processes. A direction for future work is to design indexing structures of new business processes for scaling up the proposed similarity metrics to larger datasets.
Finally, the effectiveness of the proposed integration of this measure is the relative values. These values allow for the classification of reuse priorities with higher recycling rate compared to previous models.

CONFLICTS OF INTEREST
The authors declare that there is no conflict of interests regarding the publication of this paper.