Knowledge-Oriented Models Based on Developer-Artifact and Developer-Developer Interactions

INTRODUCTION: Software development is organized around developers working collaboratively promoting two types of interactions for knowledge sharing. Developer-Artifact interactions indicate developers define or access pieces of information within artifacts. Developer-Developer interactions indicate the exchange of information among developers using a collaboration platform to clarify an issue, promote an idea, or expose any thoughtful comment. PROBLEM: The number of such interactions grows over time and makes it difficult to capture and assess the evolution of the developers’ knowledge about specific software project artifacts and tasks. Further, this knowledge decreases over time due to the natural limitations of human cognition that restrict our capabilities to cope with information overload. Besides, who has more knowledge about specific project elements are important to promote collaboration. AIMS: The <inline-formula> <tex-math notation="LaTeX">$K_{a}$ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$K_{s}$ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$K_{c}$ </tex-math></inline-formula>, and <inline-formula> <tex-math notation="LaTeX">$K_{p}$ </tex-math></inline-formula> models capture the evolution of the developers’ knowledge about software project elements such as artifacts, tasks, similar tasks, and the whole software project. These models represent not only the knowledge developers have about these elements but also capture how this knowledge decreases over time based on forgetting and relearning functions. EVALUATION: An experimental study analyzed some developers’ interactions on artifacts for the purpose of predicting the evolution of developers’ knowledge in six software projects. The results show that the developers’ rankings by performed tasks and by our models have 72% or more of similarity. CONCLUSION: Our models can capture and assess the evolution of the developers’ knowledge and help to identify which developers have more knowledge about specific elements of software projects.


I. INTRODUCTION
The modern software development workflow is peoplecentered and most of the time conducted collaboratively [1]. Typically, this workflow suggests developers follow daily routines involving executing a set of tasks, interacting with other developers and manipulating artifacts such as source code, documentation, and configuration files. As a result, The associate editor coordinating the review of this manuscript and approving it for publication was Porfirio Tramontana . developers continuously and routinely interact with each other (Developer-Developer interactions) and with artifacts (Developers-Artifact interactions). Developer-Developer interactions occur face-to-face or through communication tools, when developers need to clarify some questions about performing a task. Developer-Artifact interactions happen when a developer reads or writes an artifact. Reading is used to gain knowledge about the artifact, typically to identify the parts of the artifact that may be changed to complete a task's goal. Writing occurs by adding or removing information from artifacts that need to evolve, sometimes to accommodate a new feature, fix a bug or improve code.
The development workflow is typically orchestrated by a project plan (or sprint plan) describing the tasks developers agree on executing [2]. When a developer is assigned to a task, he or she usually tries to leverage on his/her past experience to remember similar performed tasks and manipulated artifacts. He/she may also try to figure out who is the most knowledgeable developer in the task's subject for a future call for help [3]. Remembering being in a similar development context, with similar developers and artifacts, is useful as it may help developers to avoid recurring problems and adopt best practices. In order to leverage on experience, developers intuitively browse their memory and the project's history in a cycle of discovery and remembrance of a task context. When a project history has a small number of tasks and artifacts, remembering task contexts is feasible. However, a typical software projects may last several months or years, accumulating a vast amount of tasks, artifacts and developers. As a result, intuitively browsing such vast amount of tasks, artifacts, developers and their relationships might be difficult, if not impossible. Moreover, the developers' knowledge about the history of tasks is limited as they tend to forget the past because of the natural limitations of human cognition that restricts our capabilities to cope with information overload [4], [5]. In this scenario, a model and the associated tool support, to measure the developers' knowledge of a given project element (task or artifact) can help other developers when dealing with a similar context along the project.
Our literature review has showed that Fritz et al. [6] proposed a model to infer the degree of developers' knowledge about software artifacts. The proposed measure assigns a degree of knowledge to a developer about an artifact over time, a combination of interest and disinterest of the developer over the artifact considering the authoring information. Rigby et al. [7] defined ''knowledge loss of a project'' as the number of files that are abandoned in turnover at quarterly intervals. Line-of-Code ownership means that the last developer changing a line of code has the ownership of a file and a file is abandoned when 90% or more of the lines are abandoned, possibly by a developer who has left the project. Other proposals use different words related to Knowledge such as Expertise, Expert, Affinity, Experience, Compatibility, and Skill to measure the relationship between developers and artifacts [8]- [13]. The result of this review shows that no work has so far been done concerning the fact that humans tend to forget bits of information [14]- [19], although there is evidence that developers tend to forget information about artifacts they have not manipulated for a while [20]- [22].
In this article, we present four knowledge-oriented models to represent the developer's knowledge about the elements of a software project. The K a , K s , K c , and K p models capture the evolution of the developers' knowledge about artifacts, tasks, similar tasks, and the whole software project, respectively. The proposed models combine information from Developer-Developer and Developer-Artifact interactions with a state-of-the-art way to capture the developers' forgetting and relearning. Given that the term measurement is defined as a quantitatively expressed reduction of uncertainty based on one or more observations [23], we claim that the proposed models measure developer's knowledge based on Developer-Artifact and Developer-Developer interactions taking into account knowledge depreciation over time, i.e., interactions from a distant past contribute less to the knowledge measure than the most recent interactions. According to our literature review, there is no research addressing explicitly the human limitations related to forgetting and relearning, and inferring knowledge about artifacts, tasks, similar tasks, and the whole software project.
An evaluation of the proposed models was performed using data from six open-source projects. The official ranking of the project based on the number of performed tasks was compared with the ranking produced with our K p model for measuring developers' knowledge about the whole project. The results presented an average correlation greater than or equal to 72% between the two rankings. The model for the whole project depends on the models of similar tasks, tasks, and artifacts. There is also evidence that these models can capture developers' knowledge about artifacts, tasks, and similar tasks and use this knowledge to identify experts, i.e., the persons who have the most knowledge about specific project elements. These models, which measure knowledge about similar tasks and the whole software project, use a new metaheuristic for clustering based on software modules [24].
The article is organized as follows. Section II presents an overview of our approach. Section III describes the knowledge-oriented models. Section IV presents an evaluation of the proposed model. A discussion about the results of the evaluation is presented in Section V and Section VI describes related work. Finally, Section VII wraps up this article with our conclusion and future work.

II. APPROACH OVERVIEW
An essential procedure in software development is the manipulation of software artifacts, such as models, source code and configuration files, to implement new software functionalities or amend existing ones [25]. Such manipulation is typically supported by an Integrated Development Environment (IDE) such as Eclipse or VSCode, through which developers can easily browse or create project-related artifacts. As illustrated in Figure 1, a developer d1 can use the IDE to read, write or update artifacts, establishing a series of Developer-Artifact interactions that can be used as a proxy to the developers' knowledge on a single artifact, i.e., if the developer d1 manipulates the artifact a1 several times during the artifact lifetime, d1 must be knowledgeable of a1.
Software development is also collaborative in the sense that developers exchange information regarding topics of interest, sometimes to clarify a reasoning path or to simply ask for a specific information. Note that in Figure 1 the developer d1 interacts with the developer d2 (and with other developers) establishing a series of Developer-Developer interactions. These interactions can also be used as a proxy to the developers' knowledge about a specific topic, if developer-developer communications are properly harnessed. Developer-Artifact and Developer-Developer interactions typically occur in the context of a task, where a task describes the piece of work to be performed in the context of a software project. Examples of a task are ''fixing an issue'' or ''adding a new feature''. When a task is handled by a developer, he or she often manipulates artifacts and communicates with other developers.
Over the lifetime of a software project, the combination of Developer-Artifact interactions, Developer-Developer interactions and Developer-Task executions, enables creating a project history that can be analyzed to help discovering the most knowledgeable developers, similar tasks and specific task contexts. This article introduces Developer-Artifact (I da ) and Developer-Developer (I dd ) interaction degrees to define knowledge-oriented models that can be used to measure the amount of knowledge developers have about artifacts and tasks. We claim that using our models to find the most knowledgeable developers for a given task or artifact may help developers when new similar tasks need to be performed. The rationale supporting our claim is that when a new task s1 is under responsibility of developer d1, he or she may need to define s1 s context including: a) the tasks that are similar to s1 that contain information he or she may leverage on; b) the developers that may help him on clarifying issues regarding s1, i.e. developers that are knowledgeable on tasks similar to s1; c) the artifacts that were manipulated by developers when executing tasks similar to s1, i.e. developers that are knowledgeable on artifacts that might be required to handle s1. Building this context for a new task is infeasible for realworld projects as d1 would need to skim through very large artifacts and developers. In addition, developers tend to forget previous work and their knowledge about specific artifacts fade over time. Figure 2 illustrates our approach to discovering knowledgeable developers. First we need to record the Developer-Artifact and Developer-Developer interactions that happen when developers are handling their tasks into a Project Interaction History (Step 1 in Figure 2). After recording these interactions, we apply our K a and K s models to measure the developer's knowledge on artifacts and tasks (Step 2 in Figure  2), and then apply a clustering technique to gather tasks that are similar based on the artifacts such tasks interact with (Step 3 in Figure 2). Finally, we measure the developer's knowledge about similar tasks and the whole project history using our K c and K p models (Step 4 in Figure 2). More specifically, the K a model measures a developer's knowledge about an artifact over time based on all edit interactions that the developer had with an artifact (see Section III-A). The K s model measures the developer's knowledge about a task over time considering the result of the developer's knowledge about every artifact associated with a task, i.e. K a applied to every artifact associated with the task based on edit interactions (see Section III-B). The K c model measures the developer's knowledge about a cluster of similar tasks considering the developer's knowledge about every task belonging to a cluster (see Section III-C). Finally, the K p model measures the developer's knowledge about the entire history of project tasks based on the developer's knowledge about similar task clusters (see Section III-D).

III. KNOWLEDGE-ORIENTED MODELS TO MEASURE DEVELOPER's KNOWLEDGE IN SOFTWARE PROJECTS
Our models use degrees of interaction between developers and artifacts (I da ), and among developers (I dd ) to measure developers' knowledge about a software development project. Figure 3 shows the representation of I da and I dd interactions based on the W3C provenance model [26]. We define Artifact, Task, and Developer types. Artifact is an Entity specialization for representing digital files such as source code and documentation. The Developer type represents an Agent type, a person who has knowledge and authorization to perform specific tasks. The Task is an Activity specialization and may represent an error or the addition of new functionality to the software. Depending on the tool used, the task may be called by different names. In Bugzilla, a task is called Bug and in GitHub it is named an Issue.
The I da interaction is depicted by the WasChangedBy relationship between Artifact and Task in Figure 3. This relationship ensures that an artifact can be changed by one or more tasks. For each change in an artifact, the degree and final time of the change are recorded. The degree in this case is a measure of the size of a change (e.g., in WasChangedBy). This information depends on the specific development environment. For example, the degree may be expressed by the number of rows changed, the number of characters changed or the result of an interaction model (e.g., the one supported by Mylyn [27]). Figure 4 shows an example of I da interaction.
The record of this interaction shows that developer Maria changed the main.java artifact while performing the task with id equal to 345. The change took place on 10/10/2017 at 1:31:10 pm with degree 5.
The I dd interaction represents developers' conversations about performing each task. The class model in Figure 3 expresses that a developer can talk to one or more developers. A conversation is composed of communications between the participants, and includes a record of the degree, the time and the reply attribute. The degree in this case is a measure of how much information was exchanged in an interaction (e.g., in TalksWith). This information depends on the development environment in which the project is developed. In GitHub, for example, conversations are recorded in a text format based on message exchanges where the degree can be the number of words or the number of lines of each message. Figure 5 shows an example of I dd interaction, the record of the conversation between Mary and John about performing task 345. Mary sends a textual message to John on 10/10/2017 at 1:33:17 PM and John answers her on 10/11/2017 at 10:13:10 AM.
The class model in Figure 3 also asserts that one task may have similarities to other tasks. In this article, project history tasks are organized into similar task groups based on the I da interactions. Figure 6 illustrates the similarity between two tasks considering these interactions. In this figure a task is depicted by a triangle and an artifact by a square. An I da interaction is depicted by an arrow from the task to the  artifact. Thus, tasks S239875 and S245189 are defined to be similar with a weight equal to 3 because the developers performed I da interactions on three artifacts (A419, A425, and A1243) while working on these tasks.
In what follows we will present our models in detail and describe how they are used to measure developers' knowledge about software project elements such as artifacts, tasks, similar tasks, and the whole software project.

A. DEVELOPER-ARTIFACT KNOWLEDGE MEASURE
The K a model measures the developer's knowledge about an artifact using I da interactions. We define I da (d, a, f endtime ) as the degree of interaction between a developer (d) and an  artifact (a) during a specific time interval, recorded at the interaction end time (f endtime ). Because each artifact has a degree of production and maintenance difficulty, each interaction needs to be normalized by the current maximum interaction recorded for the artifact.
Equation (1) defines II da as the normalized interaction of an I da interaction. Let D be the set of all developers who have interacted by I da with artifact (a) until time t. So II da (d, a, t) is the result of dividing I da (d, a, t) by the maximum I da interaction among all developers for the artifact (a) until t.  how intense the developer's interaction is with respect to the other interactions with the same artifact.
The K a model also considers the developers' forgetting and relearning. These aspects are modeled using a hyperbolic function. This function was chosen because it is a simple functional form that can model forgetting consistently [28], [29]. Equation (2) represents how much developers remember their interactions considering relearning, R d . The forgetting constant b was set to 0.025 to calibrate knowledge depreciation based on an existing study [30]. This study indicates that developers, on average, forget about 50% of a file in 40 days if they don't revisit the file within that period. In addition, we used the information that developers tend to forget artifacts that were not manipulated by them for a while [20]- [22]. Let t be the number of days that have passed since the recording of an I da interaction.
The function that models a developer's interaction with an artifact and its neighbors is defined as G d (n) = 7 e n +6 · b, where n is the number of days. We have chosen to set n from 0 to 7 because previous studies (e.g., experiments by Fritz et al. [20], [21]) have shown that seven working days have produced the highest correlation between developers' knowledge and their interactions. In these days, the developer d produces I da interaction: 1-with the artifact a; or 2-with artifacts related to tasks that are related through the I da interaction to the artifact a (neighbors). The smaller G d , the lower the depreciation of the developer's knowledge about artifact (a), and we consider that there was a relearning of the artifactrelated tasks in that period. Equation (3) defines G d and Fig. 8 shows its graph.  G d (n) = 7 e n + 6 · b n is an integer such that 0 ≤ n ≤ 7 (3) Figure 9 shows the values of R d with a highlight for the 10th, 40th, 65th, and 365th days. Relearning has the same behavior for artifacts that have had early and late interactions over time. For n equal to zero, there was no relearning, and the curve has the smaller values for R d . On the other hand, when the developer produces, in all the last seven days, I da interaction with an artifact a or with task-related artifacts through I da with artifact a, for n equal to 7, the values of the curve R d will have larger values. The graph in Figure 9 also shows that the higher the value of n, the higher the values of R d . Table 1 shows the influences of the values of n and t on R d . For n equal to zero, without relearning in the last 7 days, the developer remembers 97.6% of the I da interaction performed in the previous day ( t = 1). On the fortieth day ( t = 40), the developer remembers 50% of the interaction and only 10% after one year. For n equal to 7, with relearning, the developer fully remembers the I da interactions performed in the last 3 days. On the fortieth day, developer remembers 99.4% of the interaction and in one year he or she remembers 94.5%. Table 1 presents the R d values for all values of n and some values of t ( t = 0, 3, 5, 7, 39, 40, 41, 63, 65, 67, and 365).
After briefly analyzing R d , we define K a (d, a, t) to represent the knowledge measure of the developer d about artifact a at date-time t. The K a model is composed of two factors. The first one calculates the normalized interaction (II da ) for each I da interaction produced up to t. The second factor represents the forgetting of the interactions registered in the first factor, R d . In this way, the developer's knowledge about an artifact is distributed over time, so that newer interactions contribute more than older ones. This is reasonable because developers tend to forget artifacts that have not been manipulated by them for a while [20]- [22].
Equation (4) formalizes K a (d, a, t) as the sum of the product of II da by R d for all I da interactions produced by a developer d with the artifact a up to date-time t. The t i calculates the number of days that have passed from the time the I da interaction took place and the time t, where t is the date-time on which the developer's knowledge need to be estimated. According to K a , the more I da interactions a developer has with an artifact, the more knowledge he or she can have about it. This is true because II da is always a positive number.
During a software development project, developers create and change artifacts through I da interactions. Besides, developers interact with other developers to perform tasks, and this is captured by I dd interactions. The K s model measures the developer's knowledge of a task by making use of these interactions. Figure 10 illustrates how developers interact with artifacts through I da interactions to perform a task. The developer d5 interacted with artifact a1188 twice: the first interaction occurred at time 45 with degree 15, and the second interaction was in time 54 with degree 11. The developer d5 also interacted with artifact a1475 at times 60 and 80 with degrees equal to 26 and 18, respectively. According to data from the Bugzilla attachment field of the Mylyn project, more than one developer has contributed with I da interactions to perform tasks, such as in the case of tasks IDs 166406, 210686, 234065 and 248490. Therefore, there is evidence that the knowledge measure about tasks should consider I da interactions. Figure 11 illustrates I dd interactions among developers. The task was assigned to d14 and d10 has written about how the task needed to be done before the task was assigned to d14. This case shows that developer d10 has knowledge about this task, but he has not produced an interaction about artifacts related to this task through I da interactions. This is an evidence that the knowledge measure about tasks should also consider I dd interactions.   In this scenario, the measure of the developer's knowledge about a task s is defined based on the measure of his knowledge of the artifacts related to s through I da interactions combined with his contribution through I dd interactions produced while performing s. Because each task has a degree of production and maintenance difficulty, each II dd interaction needs to be normalized by the current maximum interaction recorded while developers perform the task. The maximum value of I dd interaction is defined as the maximum degree value among all I dd interactions concerning task s until time t. Equation (5) defines II dd as the normalized interaction of an I dd interaction. Let D be the set of all developers who have interacted through I dd while performing task (s) until time t. Additionally, II dd (d, s, t) is the result of dividing I dd (d, s, t) by the maximum value of I dd interaction. Equation (6) formalizes K s (d, s, t) as the sum of the developer's knowledge d about all task-related artifacts ( a∈s K a ) and all normalized interaction (II dd ) for each I dd interaction produced by the developer d in the execution of the task s up to the time t. The developer's knowledge about a task also undergoes depreciation over time -the first depreciation is inherited from (4), and the second one is caused by the fact that the second added of K s uses the forgetting function R d . In R d , n is the number of days in the last seven days in which the developer d expressed knowledge about the task s through I dd interactions. An example of this type of interactions is sending a message about the task s to every developer involved in performing it.
The K c (d, c, t) model is defined as the measure of the developer's knowledge about a similar-task group (c) at date-time t. K c has two factors: the first is h m , where h is the number of tasks related to the group that the developer belongs to, and m is the number of tasks belonging to the group. The h m factor quantifies the developer's involvement in group tasks (i.e., this factor penalizes the developer who only has knowledge about a few tasks). In the best case, when h equals to m, the developer has knowledge about all tasks in the group, h m = 1. The second factor is the sum of the K s (d, s, t), the developer knowledge measure about each group task. Equation (7) shows K c (d, c, t) and Table 2 shows the values of K c for groups that have up to 10 tasks, with K s set to 1 for all tasks. Table 3 shows an example of how the h m adjustment factor can influence the K c measure. For example, in a group with five similar tasks, the developer d1 has knowledge of two tasks with K s = {80, 35} and developer d2 has knowledge about all five tasks with K s = {7, 9, 6, 13, 8}. According to K c , considering the adjustment factor, developers have knowledge with close values: K c (d1) = 46 and K c (d2) = 43. However, when the h m factor is not considered, the developer d1 has a K c value that is much higher than d2, even though he or she has no knowledge about 3 tasks in a group of 5 tasks.

D. DEVELOPER-PROJECT KNOWLEDGE MEASURE
The contribution of developers in software projects can be measured in different ways: by producing lines of code, by counting function points [31], and for open source project, by using measures based on the performed tasks and the number of commits. 1 Our K p model measures the developers' knowledge about a software project considering the measure of developer's knowledge about similar-task groups, i.e., clusters of similar tasks (K c ). More specifically, K p (d, p, t) represents the developer's knowledge measure of a developer d about a software project p at a date-time t as shown in equation (8).
The first part of equation (8), the r q adjustment factor, determines the developer's participation in similar-task groups, where r is the number of groups that the developer knows about the software project and q is the total number of groups into which the software project was organized. If the developer participates in all groups, the factor is 1. This adjustment factor penalizes the developers who only know about a few similar-task groups. The influence of h m factor on the K c measure, which is analyzed in Tables 2 and 3, is similar to the influence of r q factor on the K p measure. The difference is that in K c measure, the factor acts on K s , whereas in the K p , the factor acts on K c . The second factor in equation (8) represents the sum of the developer's knowledge about similar-task groups (K c ).

IV. EVALUATION
The proposed evaluation method follows the guidelines of GQM (Goals Questions Metrics) approach [32]: Analyze some developers' edit interactions on software artifacts for the purpose of predicting developer's knowledge with respect to software projects from point of view of the project manager.
The general question: Q1 (Who knows about the software project?) can be answered using four measures. The first measure -the number of performed tasks -is a wellaccepted measure to infer the production or participation of the developers in software projects, and is widely used in open source projects such as those that use Eclipse as an administration tool. 2 The second measure takes into account the number of edit interactions performed by developers to perform tasks. We note that in the study with Mylyn Docs project, both edit and commit interactions are considered and commit is assumed to be an inference of edit because of the lack of edit data. The third measure is the number of groups of similar tasks in which the developer has knowledge based on K c model, i.e., the number of clusters that the developer has knowledge considering the whole software project. For example, a project with 100 tasks can be clustered in 5 groups of similar tasks. In this scenario, a developer that has done 30 tasks belonging to 3 similar groups can have knowledge about these 3 groups. The fourth measure, which is based on K p model and illustrated in Section III-D, infers the developer's knowledge about software projects considering similar tasks captured by I da and the developer's interaction degree with artifacts that takes into account forgetting. Analysis of these four measures led us to define three research questions to compare the similarities between the rankings, which are ordered lists from highest to lowest values, produced by these four measures: RQ1: Is the ranking of developers by performed tasks (R pt ) similar to the ranking by their interactions with artifacts?
RQ2: Is the ranking R pt similar to the ranking based on the K c model?
RQ3: Is the ranking R pt similar to the ranking based on the K p model?
Our expectation is that these research questions show that the K p model presents more similarity than the K c model when compared to the R pt . On the other hand, the K p model also presents a result close to the comparison of the number of interactions with the R pt . Thus, the K p model can rank the developers of a software project in the same direction as by the number of interactions or the number of performed tasks.
In addition to these research questions, a sensitivity analysis of K p is also performed to assess the influence of the parameters R d , G d , h m and r q . The following subsections are structured as follows. Next, Subsection IV-A describes how the data was collected. Subsection IV-B describes the steps for carrying out this evaluation study. Subsection IV-C presents the results that support the answers to questions RQ1, RQ2, RQ3 and describe the influence of parameters on K p using data from the Mylyn Docs project. Finally, in Subsection IV-D, threats to the validity of this study are reported.

A. DATA COLLECTION
Lethbridge et al. [33] presented a taxonomy for the data collection techniques based on the degree of human contact. In Lethbridge and colleagues' taxonomy, our study is classified in the third degree because it requires access only to work artifacts using analysis techniques based on tool logs. The data was collected from the Mylyn Docs project , hosted at https://www.eclipse.org/. Initially, randomly, eight projects were selected from GitHub (https://github.com). The Homebrew/homebrewformula-analytics project was discarded for having only six developers, and the brewsci/homebrew-science and kerl/kerl projects were discarded because the measures did not vary between trials. The selected projects from GitHub were: apache/commons-lang, brewsci/homebrewbio, google/EarlGrey, Homebrew/homebrew-cask-drivers and iodide-project/iodide. The Mylyn Docs project is a subproject of the Mylyn project involving an Eclipse IDE plugin that reduces the number of available artifacts to facilitate browsing artifacts relevant to a given task. The collected data was generated by the Git, Bugzilla and Mylyn tools organized in three obser-vation perspectives: the first collects data from the Mylyn Docs project to generate the baseline of this evaluation study; the second collects the data recorded by the Mylyn plugin of the developer's interactions with artifacts; and the third perspective was produced by the Git tool that records the artifact submissions in the project repository. Commit actions were collected to infer edit actions given that, usually, a commit action submits edited files to a code repository. In addition, we have identified some performed tasks in the Mylyn Docs project that do not have Mylyn logging. Next, we will detail the data collection performed according to each perspective.
The first perspective, Performed Tasks, was generated from querying the Mylyn Docs data recorded by Eclipse Bugzilla. 3 The parameters for the query were: Classification = Mylyn and Product = ''Mylyn Docs'' and Component = EPUB or Framework or HtmlText or Wikitext and Status = RESOLVED and Resolution = FIXED. This query ensures that tasks marked as solved and tested by developers in the EPUB, Framework, HtmlText, and Wikitext components of the Mylyn Docs project that belong to Mylyn are returned after execution. The query was run on September 18, 2017, and generated an XML file from which the bug_id, assigned_to, and endtask information of 609 performed tasks (i.e., bug fixes and new features) were extracted.
The second observation perspective, Edit Data, was produced from the log generated by the tool Mylyn based on the execution of the tasks available in the attached files in Eclipse Bugzilla. The parameters for the query were the ones described in the query from the first perspective and additionally: and (Match ALL of the following separately→Attachment Description→contains the string→mylyn/context/zip). This search criterion returns only the tasks that have the interaction log registered by the Mylyn plugin. The query was run on April 05, 2017, and returned 345 tasks. After the query execution, all edition interactions (Kind=''edit'') were extracted from the attached files (mylyn/context/zip) in the tasks. The returned data are I da interactions that use Mylyn's DOI as the degree of I da . The DOI is defined as the degree of interest of a developer about an artifact in a specific time range. We use DOI inspired by the work of Fritz et al. that proposed the concept of a degree of knowledge [21]. For example, an event registered by Mylyn for bug 219939 shows that the developer associated with this bug performed an edit interaction that started on ''2008-07- 16  The third and last observation perspective, Commit Data, collected the data recorded by Git in the Mylyn Docs project code repository. The Git commands were used to extract the commits until April 05, 2017. First, the git clone command downloaded the data of the Mylyn Docs project. After the download, the git log command was used to extract name, email, commit_date, commit_hash, and commit_msg. Commits that identified the task at the beginning of commit_msg were selected to record the commit interaction. The git difftree command was used to extract the artifacts sent out by the commit. The degree of interest of a commit interaction was set to 1, and the start and end dates of the interaction were recorded with the commit date. The names of the developers registered in Git, in some cases, were different from those registered by Bugzilla, as we have adopted the names registered in Bugzilla. The extraction resulted in 918 commits performed by 33 developers on 5407 artifacts. Overall, the evaluation study for Mylyn Docs project used data produced according to the three aforementioned perspectives until April 05, 2017.
The data were collected until December 2019 according to the first and third observation perspectives using a library for accessing GitHub projects. PullRequest represents the set of updates that were necessary for the realization of an Issue (task), returning information about performing each task (issue = closed). Due to lack of data in the assigned_to field, we consider that if a developer has altered some artifact to assist in performing a task, then we attribute to him the accomplishment of the task. The interactions of nonhuman developers, known as bots, such as testing software and updating libraries were not considered. The degree of Developer-Artifact interaction was defined as the number of lines inserted and deleted in the artifact at each commit.

B. PLANNING AND EXECUTION
The experiment is organized in steps to generate rankings of developers who have knowledge about the projects selected for this evaluation. The rankings are generated in four different ways. The baseline is the ranking of the number of tasks performed by the developer using the data from the Performed Tasks observation perspective. The ranking by interactions is constructed from observation data from Edit Data and Commit Data based on the number of interactions the developer has made. The K c ranking orders the developers by the number of groups (i.e., clusters) in which they have knowledge according to the K c model. The K p ranking uses the K c model. Both models, K c and K p , use data from the Edit Data and Commit Data perspectives. This study uses a clustering technique to build K c and K p rankings.
Clustering is a computational technique for organizing data objects (elements) into clusters (groups). By providing data clusters, this technique facilitates information understanding. Similar elements tend to be in the same group, whereas less similar or non-similar elements tend to be in different groups [34], [35]. There are many algorithms for data clustering [36]. One of the most best-known and used clustering algorithms is K-means [37]. However, no algorithm gives optimal results for all data sets. In this article, the LNS-SMC algorithm [24] is used because it has presented a good efficiency when the modularization quality (MQ) measure was applied to the software module clustering problem. The MQ measure is used to measure the clustering quality of software modules (e.g., classes and source code) according to the cohesion and coupling between modules. Cohesion is the number of internal dependencies a source file has in relation to its package, whereas coupling is the number of external dependencies. For example, a dependency is described when importing a class in Java. The goal of clustering is to reorganize the software into packages that are highly cohesive and loosely coupled.
Equation (9) explains the MQ measure from the perspective of the clustering problem of similar tasks by edit interaction. Let C represent the result of a clustering algorithm with n clusters, that is, C = C 1 , C 2 , C 3 , .., C n . The MQ quality measure of a cluster C is defined as the sum of the quality of each cluster C k , i.e., MF C k . For this experiment, we define an association between two tasks when exists at least one artifact that was edited by developers who performed these two tasks. The value of MF is a function of the number of internal (i) and external (j) associations of C k tasks. For example, when all tasks in a C k cluster do not have common artifacts with tasks in the same C k , i is equal zero and MF is also zero, and this is the worst case. In the best case, all tasks in a C k cluster are related to the same artifacts and do not have artifacts in common with tasks of other clusters. Thus, i> 0, j is zero, and MF has a maximum value equal to 1.

MQ
The LNS-SMC algorithm is classified as a heuristic to find an optimal solution by exploring a large neighborhood seach (LNS) method that has been applied to the software module clustering (SMC) problem. The MQ measure is used by LNS-SMC to compare solutions and choose the one with the highest MQ. This algorithm initially evaluates the MQ of a solution, choosing it as optimal, and then makes use of two operators to choose the neighbor of the optimal solution. The destroy operator is applied to the optimal solution to generate an incomplete solution, for example to cause the removal of some elements. The repair operator is then applied to the incomplete solution, generating a valid solution by, for example, inserting the removed nodes into other incomplete solution clusters. Then, it is checked if the MQ of the generated valid solution is higher than the current optimal solution. If so, the valid solution becomes the current optimal solution. These steps are performed until a stop condition is satisfied. In this study, the maximum number of trials to find a new optimal solution was set to 1000. For each trial, 10 trials of the LNS were performed for Mylyn Docs project, and the largest cluster MQ was chosen to be used in the K c and K p rankings calculations. We observed that MQ values are so close for each trial with Mylyn Docs, so only one trial of LNS was performed for each trial of the projects selected from GitHub.
The experiment was performed with a reference date for each first day of the month, and involved Mylyn Docs project VOLUME 8, 2020 from 2015-05-01 to 2016-05-01, with a total of 13 trials , and GitHub projects from 2019-02-01 to 2020-01-01, with a total of 12 trials. Each trial produced four ranking lists in descending order of developer scores based on: 1 -the performed tasks; 2 -the number of edit interactions with artifacts; 3 -the K c model, and 4 -the K p model. The rankings by interactions, K c and K p , were evaluated according to the degree of correlation with the performed task ranking using the Spearman correlation model [38] with a confidence level defined in 95%. The Spearman correlation is used to evaluate the similarity between rankings. The Spearman correlation coefficient is close to one when the rankings are very similar and close to zero when they are very different [39]. Table 4 presents the results of the first trial of the experiment Mylyn Docs project for 2015-05-01. The ranking by ''performed tasks'' orders the 10 developers who performed the greater number of tasks until 2015-05-01. The developer in the first place performed 378 tasks and the one in the tenth performed 3. The developer in the first place in this ranking was also the first in the other rankings. The developer dev1039 was ranked tenth in the ranking of performed tasks, 22nd in the interaction ranking, 13th in the ranking based on K c and 13th in ranking based on K p . It is expected that to accomplish a task, a developer should perform one or more interactions, but that's not what happened to the developers dev1512 and dev1039. Dev1512 performed 6 tasks and 1 interaction. Dev1039 performed 3 tasks and 1 interaction. We conclude that these developers did not record all their interactions while performing their tasks. We opted to keep this information in the assessment of similarities among the rankings.

C. RESULTS AND ANALYSIS
The only interaction of the developer dev1512 happened on 2009-05-26 with artifact 6135 and had degree 1, and it is only part of task 260483. This interaction applied to the K a model results in a value very close to zero (0.02). Consequently, the models K c and K p deprecated this interaction by inferring that the developer completely forgot the task he or she performed. On the other hand, the only interaction of the developer dev1039 with artifact 382 was registered on 2015-03-10 with degree 1, and it is also part of 29 tasks. According to K a , the knowledge of dev1039 about this artifact on 2015-05-01 results in 0.44. In this case, the model K c inferred the knowledge of the developer based on 13 out of 124 clusters, with K p equal to 0.13. For the developer dev1039 the K p measure indicates a value closer to reality than the K c value. Table 5 shows the results of the similarity analysis between the performed tasks ranking (i.e., baseline) with ranking based on the number of interactions, and using the K c and K p models. The study with Mylyn Docs project was organized in 13 trials, starting on May, 2015, and ending on May, 2016, and was always performed on the first day of each month. The 13th trial was ruled out because it had a p.value greater than 5% for K p . All other trials had a p.value smaller than 5%, indicating a 95% confidence in the rho values of the Spearman tests. In general, the ranking based on the K p model presents on average a ''strong similarity'' with the ranking of performed tasks (72%).
Based on these results, we provide the answers for three research questions raised in this experiment in the following paragraphs. Schober et al. [40] discuss the interpretations of Spearman test results. In this study, the correlation between rankings can also be understood as similarity between rankings, and to simplify the interpretation of rho values, the following scale will be used: 0.00-0.10 as a ''negligible similarity''; 0.10-0.39 as a ''weak similarity''; 0.40-0.69 as a ''moderate similarity''; 0.70-0.89 as a ''strong similarity'' and 0.90-1.0 as a ''very strong similarity''. Table 6 shows the results of the similarity analysis between the performed tasks ranking (i.e., baseline) with ranking based on the number of interactions, and using the K c and K p models for five projects selected from GitHub. These studies were organized in 12 trials, starting on February, 2019, and ending on January, 2020, and was always performed on the first day of each month. Trials with p.value greater than 0.05 were removed. In general, the rankings based on the number of interactions, the K c and K p models present on average a ''strong similarity'' with the ranking of performed tasks.

1) RQ1 ANALYSIS
The research question RQ1 was defined to evaluate the similarity between the number of performed task rankings and the number of interaction rankings performed per developer. Table 5 shows that the degree of similarity tends to increase with time: on 2015-05-01 it was 66% (rho) and on 2015-08-01 it became 69% in the 4th trial. These rankings are similar in 68% on average. According to the Likert scale for similarity evaluation adopted in this study plan, the performed task ranking and the interaction ranking have a ''moderate similarity''. The study with Mylyn Docs project indicates that the number of performed task rankings (R pt ) and the number of interactions ranking have a moderate similarity of 68% on average. Table 6 shows the results of five studies with projects selected from GitHub. According to the Likert scale for similarity evaluation, the performed task ranking and the interaction ranking have a ''strong similarity'' for three projects and a ''very strong similarity'' for two projects.

2) RQ2 ANALYSIS
The research question RQ2 evaluates the similarity between the performed task rankings and the ranking based on the K c model. Table 5 shows that the degree of similarity tends to increase over time: on 2015-05-01 it was 70% (rho) and on 2016-03-01 it was to 71% in the 11th trial. These rankings are similar in 70% on average, a better result compared to RQ1 (68%). The p.value was smaller than those obtained for RQ1. According to the Likert scale for similarity evaluation, the performed task ranking and the ranking based on K c model reveal a strong similarity, but very close to the range considered as moderate (i.e., rho value from 0.40 to 0.69). The study with Mylyn Docs project indicates that the R pt ranking and the ranking generated by K c model have a strong similarity of 70% on average. Table 6 shows that the performed task ranking and the ranking based on the K c model have a ''strong similarity'' for the five projects.

3) RQ3 ANALYSIS
The research question RQ3 was defined to evaluate the similarity between the performed task ranking and the ranking based on the K p model. Table 5 shows that these rankings are similar in 72% of the cases on average, a better result in comparison with the results of RQ1 and RQ2, which were 68% and 70% of the cases on average, respectively. The p.value values also decreased significantly with respect to RQ1 and RQ2 for the 6th-10th trials, indicating a higher degree of confidence for similarity values (rho). According to the Likert scale for similarity evaluation, the performed task ranking and the ranking based on the K p model revealed a strong similarity, which is higher than the upper limit of 0.69 of a moderate similarity (i.e., with value from 0.40 to 0.69). The study indicates that the R pt ranking and the ranking generated by K p model have a strong similarity of 72% on average. Table 6 shows that the performed task ranking and the ranking based on the K p model have a ''strong similarity'' for four projects and a ''very strong similarity'' for one project. Table 7 shows the influence of R d , G d , h m and r q parameters on K p , calculated on 11/11/2008 with data from the Mylyn Docs project. The better project organization with respect to similar-task groups resulted in 22 groups (q = 22). The second column presents the K p measurements according to equation (8). The K p measurements for the three developers in the third column, when calculated without considering forgetting (R d = 1), have values greater than twice those in the second column, showing that modeling forgetting has a significant influence on the K p measurements. The fourth column, with K p values for G d = 1, has lower values for the first two developers and equal values for the third developer. This indicates that the first two developers revisited artifacts in the last seven days and the third did not. The fourth column, with K p measurements for h, m, r and q values equal to 1, shows values that are very close to the values in second column for the first developer, indicating that the first developer has knowledge about most tasks and most similar task groups (i.e., r = 22 for this developer). Finally, the third developer has a K p value equal to 36 in the second column and 121 in the fourth column, which indicates an influence of these parameters on the K p measurements. This developer has knowledge of only 7 (r = 7) project task groups within a total of 22 (r = 22). Table 8 shows the influence of R d , G d , h m and r q parameters on K p , calculated on 11/11/2011 with data from the Mylyn Docs project. We note that in this table the K p measurements with R d = 1 change the developers' ranking list when compared with K p : the 3rd developer, according to the K p model, appears in the 5th place (for K p with R d = 1); the 4th developer appears in 3th place; and the 5th developer appears in 4rd place. The third column, in which we have K p without relearning, shows that only the first developer has a smaller measure in comparison with the first column, because this developer has interacted with artifacts in the last seven days to accomplish his or her tasks. Finally, the last column, in which K p has the parameters h, m, r and q equal to 1, shows that all developers are negatively affected with respect to K p when they do not have knowledge about all tasks and task groups in the project.

D. THREATS TO VALIDITY
According to [41], threats to validity can affect an experimental study in conclusion, construct, internal, and external threats. Conclusion refer to the relationship between treatment and outcome. The baseline of this study is the number of resolved tasks by developer and the other rankings have as their main input the developers' interactions with artifacts while performing tasks. Some developers of the Mylyn Docs project did not record the edit interactions, having recorded only the commit interactions. Therefore, we assign a degree equal to 1 to each commit interaction for all artifacts involved in a commit. Even so, among the 10 most accomplished developers, we have found out that two of them performed more tasks than artifact interactions. As a result, the correlations among rankings may be underestimated. Future work plans include the implementation of models and evaluations in additional software development projects in academia and industry.
Internal threats evaluate whether the relationship between treatment and outcome is causal or results from factors that the researcher cannot control. First, in a typical software development scenario where developers are working in the same room and participating in face-to-face interactions, knowledge exchanged about tasks and artifacts at first might not be captured or transmitted using an IT tool. As a consequence, interactions could occur without leaving a trace. However, we claim that these interactions are usually followed by interactions by those same developers using development and project management tools, which will update the information related to the artifacts and tasks, and reflect the decisions made during the face-to-face interactions. Second, interaction-based rankings cannot evaluate whether an interaction was positive, i.e., whether it contributed to the task or was dropped via undo-type commands. Third, there may be a distinction between the number of interactions of more experienced and less experienced developers in performing tasks with the same degree of difficulty. In the same way that personal questions (e.g., involving profiles or emotions) can influence how a developer accomplishes the task, it can also influence how he or she interacts with artifacts and with collaborators. One must investigate how the presented models can filter the noise associated with these questions.
Construct threats are associated with the relation between theory and observation, and are defined as to ensure that the treatment reflects the construct of the cause well and the outcome reflects the construct of the effect well. The K c and K p -based rankings did not use I dd -type interactions, those observed among developers and predicted by the K s model, because such interactions were not measured in the Mylyn Docs project. These interactions need to be modeled and considered in future studies.
Finally, external threats relate to the generalization of the observed results for other software projects. Our model is based on Developer-Artifact interactions (I da ), and Developer-Developer interactions (I dd ). These interactions are present in all software development projects. However, it is necessary to provide the projects with tools to support obtaining appropriate information about these interactions in other domains.

V. DISCUSSION
The proposed models for measuring developer's knowledge were evaluated to identify developers who are most knowledgeable about a software development project. The K p model measures developer's knowledge about the entire project based on the K c model, which measures developers' knowledge about a group of similar tasks using Developer- Artifact interactions, and this is done based on the K s model. The K s model measures the developers' knowledge about a task, which in turn uses the K a model, the model that measures the developers' knowledge about a artifact. Thus, the results of the K p evaluation also represent an evaluation of the K c , K s and K a models.
The study evaluates K p based on the assumption that a developer who performed a task, also acquired knowledge about it. This premise is stated by Fritz et al. [6], who argue that authorship is a relevant information for measuring knowledge. The K p evaluation used K s with Developer-Artifact interactions and did not consider Developer-Developer interactions because of the lack of data. According to the experiment performed by Moraes et al. [9], Developer-Developer interactions can also contribute to the identification of experts. Thus, the inclusion of data from Developer-Developer interactions may improve the results of the evaluation models.
We should stress that the structure of our proposed models is based on fine-grained information. For example, in the K a model an interaction between a developer and an artifact creates a link between these elements. Then, after performing a task, a developer tends to have multiple associations with the same artifact. These associations are also part of the other models (K s , K c and K p ) because they use K a . A previous study conducted by the lead author of this present article suggested that these associations may contribute to the improvement of information quality [42]. In that study, the developer who most knew about the Mylyn project was the same in three observation perspectives: (i) the number of performed tasks according to the Mylyn project's official list; (ii) the number of performed tasks according to the data collected for a preliminary evaluation of the K p model; and (iii) the K p model applied to the collected data. In all three perspectives, the developer who ranked first performed 2,053 tasks in the official list, with 765 performed tasks in the collected data, and with the K p model producing a value of 479,285. Specifically, according to this article, in the first trial of the experiment, which is presented in Table 4, the developer in first place was also the same, with 378 performed tasks, a total of 50,984 interactions (i.e., edition and commits) and a final value of K p of 1802.02.
In this study, the baseline -the number of performed tasks -represents the knowledge of what was done because it indicates who did which task, assigning 1 to the developer for each performed task. The K p model represents the knowledge of how it was done. Each performed task is associated with the artifacts by the Developer-Artifact interactions, and the value of knowledge attributed to the developer varies according to the interaction degree on artifacts over time. The baseline was compared to the number of interactions performed in the project. The average correlation degree with Mylyn Docs project was 68% with p.value values greater than 3.5% and less than 4.5% in 12 trials. However, the interaction rankings also do not take into account the developers' forgetting and relearning. The comparison of the baseline with the ranking generated by the K c model presented, on average, a correlation of 70% with p.value values less than 3.2% in 12 trials. The K c ranking presented a higher correlation than the interaction ranking besides inheriting from K s and K a the model of forgetting and relearning of developers. The comparison of the baseline with K p showed a better result in 12 trials, the average correlation was 72% with most values of p.value below 2% for the 6th to the 10th trials. Furthermore, because the K p model makes use of the K c model, it also inherits the forgetting model from K s and K a .
The results of five studies with projects selected from GitHub presented an average similarity degree next to 90% with performed tasks. For these projects, the accomplishment of a task was attributed to every developer who performed at least one commit on the task's artifacts. This decision may have contributed to the fact that the rankings by interactions, K c , and K p had similar average values. In addition, commits capture the results of various interactions between developers and artifacts. This indicates that considering data from software such as Mylyn, which works by capturing Developer-Artifact interactions, can contribute to measure developers' knowledge.
The results show that the K p model has presented more similarity than the K c model when compared to the R pt . On the other hand, K p model has also presented results close to the comparison of the number of interactions with the R pt . So, these results are evidence that the K p model is suitable to assess the evolution of the developers' knowledge for finding who knows more about specific elements of a software project and how this knowledge decreases over time.
There is no evidence that only the number of performed tasks can accurately express the developer's knowledge about a software project. As previously stated, Developer-Developer interactions are also relevant in this aspect. However, the measured number of performed tasks can be taken as a useful measure of knowledge because those who perform more tasks also tend to participate in the creation and evolution of software artifacts, when the explicit knowledge of developers is recorded. Thus, the correlation of K p with the number of performed tasks provides some evidence that the presented knowledge-oriented models contribute to the research direction that aims at measuring developers' knowledge appropriately.
DOK is the closest work related to our knowledge measurement models [6]. Therefore, it could be a candidate to be a basis for comparison in the evaluations of the presented models. However, Fritz et al. showed that the DOK model is not suitable for measuring knowledge about APIs [6], [21]. For this reason, in this study we adopted performed tasks as the most appropriate measure to measure knowledge about a software development project. Besides, assigning knowledge about a task as a result of its execution is a well-accepted practice adopted by the software community, and, as already mentioned, it is also a form of authorship, an information that contributes to the measurement of knowledge.
Our models do not use authoring information directly as the DOK model does, but it does so in an indirect way. The author of an artifact is anyone who contributed to its creation. The models punctuate these contributions, thus also indicating a degree of developer authorship related to artifacts, tasks, similar tasks, and the whole project. On the other hand, software often grows in size over time and, as a result, the artifacts also grow in size with changes caused by error correction and the addition of new features. Therefore, the author of an artifact is not only the developer who created it, but also all the developers who contributed to its current version. In this sense, our models consider the authorship information related to the developers and use this information in an appropriate way to measure developers' knowledge.
Overall, knowledge-oriented models are a complement to information about the number of performed tasks to identify who knows more about the whole or parts of the software development project.

VI. RELATED WORK
Several works extract knowledge from developers' interactions with software artifacts. Table 9 shows some related works from three observation perspectives: the source, the model and the target of the selected approaches. The first column shows how the approach captures information about the developers' interactions with artifacts. The modeling perspective, depicted in the second column, identifies whether the approach explicitly modeled the human forgetting and relearning. Finally, the target observation perspective, in the third column, shows whether the approach infers knowledge about artifacts, tasks, similar tasks, and software projects.
Robbes and Röthlisberger [43] infer the developers' knowledge from their editing, selection, and commit interactions with artifacts over time. They define two models that take into account the developers' forgetting. One model deals with knowledge about artifacts and the other with knowledge about tasks. Robbes & Röthlisberger's models neither con-sider developers' relearning nor infer developers' knowledge about similar tasks and software projects.
McDonald and Ackerman [44] defined the ''Line 10 Rule'' heuristic. According to the result of their study, the specialist in a software module is assumed to be the developer who made the last change in the module. The Expertise Recommender (ER) is based on this heuristic and can recommend an expert on an artifact or software module and also help to identify a module associated with a reported error. ER does not model the developer's relearning or infer the developers' knowledge about similar tasks and software projects.
Mockus and Herbsleb [45] defined expertise atoms as elementary units of expertise used to measure the degree of expertise of a developer. They have defined expertise in an object as the set of these elementary units associated with this object. In software development, the elementary unit of expertise is the available difference among versions of a file, which in many code repositories are the updated rows. The Expertise Browser (EA) is an implementation of this expertise measurement model: it lists the experts on an object (file) based on the number of changes (i.e., expertise atoms) made by the developers. EA can list the experts by code, documentation, functionality or product. According to Table 9, EA infers knowledge from the editions in the artifacts, does not consider the developers' forgetting and relearning, acts on artifacts, tasks and software projects, and does not infer the knowledge about similar tasks.
Girba et al. [46] defined a model to determine the owner of an artifact in function of the current owners of the artifact lines. They stated that the developer who last modified an artifact line is the current owner of the line. In this way, it is possible to infer that the current owner of an artifact is the developer who owns the most artifact lines. Therefore, Girba et al. can infer the owners of the software artifacts, and also produce a preview of the software project from the perspective of the artifact owners. This approach infers knowledge based on artifact commits and does not take into account the developers' forgetting and relearning. The model developed by Girba and colleagues can not be applied to tasks or similar tasks.
Vivacqua and Lieberman [47] have explored the use of artifacts by developers working in the Java language domain to infer knowledge according to a Likert scale (novice -beginner -intermediate -advanced -expert). The use of an artifact is measured using the parser on the Java files by analyzing which libraries, classes and methods were used and how often they were used. It is thus possible to infer the developers' knowledge about the artifacts from information about their use. This approach neither takes into account developers' forgetting or relearning nor it can be applied to similar tasks and software projects.
Mylyn captures a developer's interest in specific artifacts based on the developer's interaction with the environment [27]. In Mylyn, when a new task is started, the initial context is the same as that of the set of artifacts available in the project, and as the developer interacts with the environment, Mylyn will reduce the number of artifacts that can be viewed to minimize the developer's effort in searching for artifacts. Mylyn is based on the Degree Of Interest (DOI) model. DOI infers the developer's interest in an artifact over time based on visualization, editing, selection, command, propagation, and prediction events. The DOI model does not represent explicitly developers' forgetting and relearning about artifacts. It also does not predict interest levels directly based on tasks, similar tasks, and software projects.
Fritz et al. [21] proposed the Degree Of Knowledge (DOK), a model for inferring developers' knowledge about artifacts. This model is based on the authorship of an artifact combined with the DOI model. DOK does not explicitly model developers' forgetting and relearning, but inherits from the DOI model the ability to represent developer's interest or disinterest in specific artifacts. Therefore, DOK models can implicitly be used to represent forgetting and relearning. Besides, DOK does not act on a task, similar tasks, and software projects.
Hattori et al. [48] defined a successful change of an artifact as any error-free compilation event of the artifact executed after at least one edit. They assume that whoever has made the most changes to an artifact becomes the developer who knows the most about the artifact. Hattori and colleagues model the concept of forgetting by considering that more recent changes in time contribute more to knowledge than older changes, and similarity to Girba et al.'s work [46], they propose a visualization map of the artifacts and their respective owners. In this way, through the visualization map, it is possible to visualize the whole software project. This approach does not consider the concept of relearning of past tasks and the knowledge model does not apply to tasks.
The approach proposed by da Silva et al. [49] infers the developer's knowledge about a project, package, file, class or method based on the analysis of the editions made by him or her over time. The analysis is done by extracting commit information from code repositories, and associating each commit to a developer, package, file, class, and method in a project. Da Silva and colleagues do not consider forgetting and relearning, and also do not infer knowledge about task and similar tasks.
Finally, our knowledge-oriented models infer knowledge from the developer's edit interactions about software artifacts. In contrast to other models, our models consider the developers' forgetting to infer knowledge about artifacts, tasks, similar tasks, and software projects over time. So far, we could not identify any related work that models both forgetting and relearning, and acts based on the targets mentioned in Table 9.

VII. CONCLUSION AND FUTURE WORK
The main goal of this article was to present and evaluate a set of knowledge-oriented models based on forgetting and relearning to measure developer knowledge about artifacts, tasks, similar tasks and the whole software projects by analyzing developers' interactions while they perform tasks. Our models take into account the complexity of the tasks, the human trait of forgetting and relearning, the task similarity based on interactions, and the distribution of developers' knowledge in a software project. The knowledge-oriented models can also be seen as a novel solution to describe knowledge about how software is produced by linking developers, artifacts and tasks over time.
The evaluation of the models was performed based on information related to the interaction degree provided by the Mylyn Eclipse Plugin. In the literature, there are other ways of measuring interaction, such as the one described in the work by Omoronyia [50]. The models proposed in this article depend on measures of the interaction degree and, for this reason, the choice of the specific form of measurement may interfere with the results produced by the models. Besides, models are restricted to a particular software project. For example, knowledge that is recorded in an artifact of the same name in another software project is not considered. Future work will further investigate II dd in the K s model.
Personal issues (e.g., profile, emotions) can interfere with how a developer performs the tasks and also how he or she interacts with the artifacts and their collaborators. We need to investigate the influence of these issues on the presented models that aim at measuring developers' knowledge. Moreover, the evaluation needs to be extended by interviews or surveys to compare results from the knowledge-oriented models with the perception of developers about the experts.
A general limitation of our models is that the computation (especially the normalized computations) of the values in the model need to be done with respect to a specific time. We understand the proposed approach can work in general in two ways: (i) a static way, in which, the information is generated in a batch mode, and computations are performed with respect to a specific time based on this data; and (ii) a dynamic way, which takes into account the streaming nature of the information that is generated as a function of the interactions of developers with artifacts and other developers, and provides recommendations in real time. The proposed approach currently supports solution (i) and, because of this limitation, we provide an overview in this paragraph on how this approach can be extended to support solution (ii). A possible way to address this limitation is by using regression and correlation [51]. We suggest replacing t (time) by a time period constant in our knowledge models to associate developers with elements directly, i.e., K a (d, a), K s (d, s), K c (d, c), and K p (d, p). These extended models must be executed according to the time period chosen, adding or removing a degree of knowledge to the developer. This strategy was used by Fritz et al. [6].
We have performed a preliminary survey involving some GitHub projects to evaluate how well developers know: 1the content of the source code (i.e., artifact) about which they have made commits; and 2-closed issues (i.e., tasks) associated with these artifacts. Currently, we are preparing a new survey with project data related to a specific software organization. In addition, our relearning modeling only considers the last 7 days, but we can not ascertain that the eighth, the ninth, or the subsequent days are not relevant. Future studies also need to address this issue.
Further, the proposed knowledge-oriented models can also be used in other domains with some adaptations. For example, in the Web domain users interact with online sites to achieve a goal. In this case, the equations of the presented knowledge-oriented models could be adapted to estimate the users' interest in products, services or subjects. In the scenario of distance education, where students interact with online systems and tutors to fulfill tasks, in order to acquire knowledge, there is also an opportunity to apply the proposed models. One possible application could be to estimate the degree of student involvement in terms of effort per course and associate this degree with student grades.
Finally, these models contribute to the burgeoning literature on developers' interactions as a source for new and significant discoveries related to the software development process.
TOACY C. OLIVEIRA received the bachelor's degree in electrical engineering and the M.Sc. and Ph.D. degrees from the Pontifical Catholic University of Rio de Janeiro, Brazil, in 1992, 1997, and 2001, respectively. He worked as a Postdoctoral Researcher with the University of Waterloo, Canada, for three years. He is currently an Associate Professor with the Computing and Systems Engineering Program (PESC/COPPE), Federal University of Rio de Janeiro, Brazil. He is also an Adjunct Professor with the David R. Cheriton School of Computer Science, University of Waterloo. He has authored or coauthored more than 70 refereed publications. Besides being a leading investigator in Brazilian research projects supported by CAPES and CNPq, he has also exercised entrepreneurship by founding several companies in Brazil. His current research interests include understanding the complexities of knowledge intensive processes and developing new tools and techniques to help domain experts, a.k.a knowledge workers, achieving their goals. He has been a member of program committees of numerous conferences and workshops. From 1999 to 2011, he worked as a Software Consultant and also as the Leader of the Development of a Data Warehouse in the Brazilian Navy, through the Coppetec Foundation, Rio de Janeiro, before joining the Federal Rural University of Rio de Janeiro and acting as a Professor since 2011. Since 2017, he has been an Assistant Professor at the Federal University of Rio de Janeiro (UFRJ). He has been working at Tércio Pacitti Institute of Computer Applications and Research (NCE) and the Postgraduate Program in Informatics (PPGI/UFRJ). He is also an inventor and the TI Leader of the Acropolis Platform, a social curation environment specially designed to engage citizens in complex or long-running news stories. He has experience in databases, CSCW, and social and crowd computing systems. He has published more than 40 peer-reviewed articles in conferences and journals in the area of computer supported cooperative work and human-computer interaction. His current research interests include designing and implementing crowdsourcing systems to solve problems in a myriad of application areas.
PAULO S. C. ALENCAR (Member, IEEE) is currently a Research Professor with the David R. Cheriton School of Computer Science, University of Waterloo, and also an Associate Director of the Computer Systems Group. He has authored or coauthored more than 200 refereed publications and has been a member of program committees of numerous highly regarded conferences and workshops. His recent research in information technology and software engineering has focused on high-level software architectures, design, components, and their interfaces, software frameworks and application families, software processes, automated workflows, work graphs, and evolution, Web-based approaches and applications, open and big data applications, context-aware and event-based systems, software agents, machine learning, cognitive chatbots, artificial intelligence, and formal methods. He has received international research awards from organizations, such as Compaq and IBM. He has been a Principal or a Co-Principal Investigator in projects supported by NSERC, ORF-RE, IBM, SAP, CITO, CSER, Bell, and funding agencies in Canada, U.S., Brazil, Germany, and Argentina. He is a member of the Association of Computing Machinery, the Association for the Advancement of Artificial Intelligence, the Waterloo Water Institute (part of the Global Water Futures initiative), and the Waterloo Artificial Intelligence Institute. VOLUME 8, 2020