An Enriched Information-Theoretic Definition of Semantic Similarity in a Taxonomy

This paper addresses the notion of semantic similarity between concepts organized according to a taxonomy, based on the well-known information content approach. This approach has been widely experimented in the literature over the years and, in general, outperforms other proposals which do not originate from it. However, it shows some limitations related to the notion of generic sense of a concept. In this paper we illustrate the problem arising by using the traditional approach, and a novel information-theoretic definition of semantic similarity in a taxonomy is proposed which also takes into account the intended sense of a concept in a given context. This proposal has been applied to some among the most representative state-of-the-art similarity measures based on the information content approach, and the experiment shows that it achieves very high correlation values with human judgment.


I. INTRODUCTION
The information-theoretic definitions of semantic similarity defined by Resnik in [38], [39] and by Lin in [33], more than two decades ago, have been extensively mentioned and investigated in the literature, and a significant amount of similarity measures have been proposed originating from them, by relying on the information content approach [11], [12]. It is based on a probabilistic model that can be applied not only to concepts organized according to an ISA taxonomy (taxonomy for short), but also to ordinal values, feature vectors, and words. In particular, with regard to concepts organized according to a taxonomy, which is the focus of this paper, the information content approach was proposed in order to overcome the limitations of alternative methods for evaluating concept similarity, such as the edge-counting approach [37]. The key idea on which it relies is the following: the more information two concepts share the more similar they are, and concept similarity is directly proportional to the maximum information content shared by the concepts. The similarity measures based on the information content approach have been widely investigated in the literature over the years and, in general, have shown a higher correlation with human judgment with respect to other proposals which do not originate from it [3], [8], as experimented also by the authors in [14].
The associate editor coordinating the review of this manuscript and approving it for publication was Fabrizio Marozzo .
The starting assumption of this paper is that semantic similarity has to be computed by taking into account not only the information contents of the concepts but also the context, 1 because language is ambiguous, and different contexts can lead to different similarity degrees among the same concepts. It could be argued that a taxonomy, in order to work properly for a given purpose in a given application domain, should reflect a specific point of view, also referred to as perspective in [33]. For instance, consider a taxonomy about animals. If the taxonomy distinguishes pets from wild animals, cats will result more similar to dogs than to lions, but if the taxonomy describes families of animals, such as felines and canids, cats will result more similar to lions than to dogs. However, building domain specific taxonomies is not a simple task, because it requires a specific background knowledge and a significant amount of effort from domain experts. Therefore, in many cases, it is preferable to adopt general purpose and widely accepted taxonomies (e.g., WordNet [32]), which do not rely on specific perspectives.
As shown in this paper, context (or perspective) is fundamental in evaluating semantic similarity, and its role is more evident if we focus for instance on siblings, i.e., concepts of the taxonomy with the same parent, which share the same information content. Note that also the approach in [33] is based on the notion of perspective, but it does not allow to evaluate similarity by addressing a single perspective at a time, and the proposed information-theoretic definition of similarity between concepts is interpreted as ''a weighted average of their similarities computed from different perspectives''. For this reason, in this work, we refer to the mentioned information content notion as the similarity between the concept generic senses, i.e., the senses of the concepts that are not related to any specific context.
In this paper, we show how the similarity based on concept generic senses is not adequate to capture the meanings of the concepts related to specific contexts, here referred to as their intended senses. The problem will be shown by using an example involving sibling concepts in a simple taxonomy. Therefore, we propose an enrichment of the information-theoretic definition of semantic similarity in a taxonomy, that takes also into account the concept intended senses, i.e., concept meanings according to the given application domain.
It is important to observe that, to our knowledge, the approach presented in this paper is novel and does not allow a comparison with existing proposals due to the inherently different assumptions on which it relies. However, in order to validate it, additional hypotheses have been made on the addressed benchmark dataset, and an experimentation involving some of the most significant information content similarity measures has been performed, based on the well-known Miller&Charles dataset [35]. Note that this proposal can be applied to existing semantic similarity measures, and the experiment shows that it achieves high correlation values with human judgment in line with the literature.
The paper is organized as follows. In Section II, the problem is informally introduced by using an example, and the rationale behind the proposed approach is illustrated. Successively, in Section III, the enriched similarity measure is formally presented. In Section IV, the experiment is presented, that includes the disambiguation of the intended concept senses and, furthermore, the evaluation of their relevance that is illustrated in Subsection IV-A. The related work follows in Section V, and Section VI concludes.

II. SEMANTIC SIMILARITY IN A TAXONOMY
In this section, the topic addressed in this paper is informally presented by using a running example.

A. THE INFORMATION CONTENT APPROACH
According to Resnik [38], [39], the notion of semantic similarity between concepts organized according to a taxonomy relies on concept frequencies in text corpora, e.g., huge collections of text samples of American English. As mentioned above, the basic assumption of the approach is the following: the more information two concepts share the more similar they are, and the similarity between concepts is given by the maximum information content shared by them, which is represented by the information content of their most informative subsumer (i.e., the most specific concept in the taxonomy that is more general than both of them). The root of the taxonomy is the concept whose information content is null by definition, since it represents the most abstract concept.
For the sake of simplicity, in this section we address an example involving siblings, i.e., concepts that in the taxonomy are direct descendants of the same node, that is their parent. Figure 1 shows a fragment of a taxonomy where the concept person is the parent of the three concepts student, employee, and planter (children). The similarity between siblings is given by the information content associated with their parent, which is the maximum shared between them. For this reason siblings, in pairs, all have the same semantic similarity degrees. Therefore, in the example, the maximum information content shared by the pairs (employee, student) and (employee, planter) is the one associated with their parent, person, and the following holds: where sim stands for the similarity degree of the pair. Of course, this value also coincides with the one of the pair (student, planter).
As a result, according to Resnik, siblings are indistinguishable from a similarity point of view, and the approach does not allow to capture further semantic aspects of the concepts, in order to have different pairs of siblings with different similarity degrees.
In the approach proposed by Lin [33], the notion of semantic similarity proposed by Resnik has been refined by also addressing the information contents of the compared concepts and, therefore, the related concept frequencies (or probabilities). Let us consider again the pairs of concepts (employee, student) and (employee, planter). Assume that the frequency of the concept student in a text corpus is greater than the one of the concept planter (but the opposite hypothesis can be taken as well). According to this assumption, the similarity degree between the concepts employee and student is greater than the one between employee and planter (see Section III where the similarity measure of Lin is formally recalled), i.e.: sim(employee, student) > sim(employee, planter). Therefore, following this approach, given a set of sibling concepts in a taxonomy, one of them, in this case employee, is more similar to the ''most frequent'' sibling in a given corpus, i.e., student in the example. With respect to the previous approach, pairs of siblings do not have the same similarity degrees, however similarity is evaluated by considering only concept frequencies and, in particular, the more frequent two siblings are the more similar they are. Indeed, as mentioned in the Introduction, this approach relies on the concept generic senses, i.e., meanings that are not related to any specific context. (Note that the above argumentation also holds in the case of evolved information content models, which are not based on concept frequencies in large-scale corpora, as discussed in the Related Work Section.) In the next section, we propose a refinement of the information-theoretic definition of semantic similarity given in [33], by considering an additional element: the meanings of the concepts in a given application domain.

B. ENRICHING CONCEPTS WITH THE INTENDED SENSES
In our proposal, given a taxonomy of concepts and an application domain, we aim at ''enriching'' these concepts with other concepts of the same or another taxonomy, if there are any, that represent their meanings in that domain, as informally illustrated below.
Consider again the taxonomy of Figure 1, and suppose we have an application domain for which an important requirement for people is to spend several hours per day in a building. According to this perspective, we expect employee to be more similar to student rather than to planter, because an employee and a student are both characterized by the mentioned requirement better than the concepts employee and planter. Therefore, we expect that the following holds: This is not the case if we consider another perspective, or application domain, where for instance it is more important to focus on people's income. Of course, in this second case, we expect that employee will be more similar to planter rather than to student, since the first two concepts share some form of payment. Therefore, in this second case, it is reasonable to expect the following: sim(employee, student) < sim(employee, planter).
For these reasons, we propose to compute semantic similarity by also addressing the meanings that concepts have in the given domain, i.e., their intended senses in that domain. For instance, consider in Figure 2 an extension of the fragment of the taxonomy shown in Figure 1, where the concept building has office and college as children, and payment is the parent of reward and salary. Now, in line with the first perspective illustrated above, suppose we have an application domain, say D 1 , where it is important to characterize people on the basis of the time they spend in an edifice per day. Let S D 1 be the function associating the concepts of the taxonomy with their intended senses in the domain D 1 , defined as follows: In the proposed approach, concept similarity is evaluated by addressing not only the maximum information content shared by the compared concepts, but also the one shared by their intended senses. Therefore, consider again the two pairs of siblings of our example. The intended senses of the concepts employee and student are office and college, respectively, which have building, their parent, as maximum shared information content (see Figure 2). Whereas, with regard to employee and planter, the most specific concept in the taxonomy that is more general than their meanings office and reward is the root, whose information content is null by definition. For this reason, for the related similarity degrees, we expect the following: In order to address the second scenario, where earnings are more relevant than workplaces, consider another application domain, say D 2 , for which the intended sense of employee is defined by the function S D 2 as follows: S D 2 (employee) = salary while keeping the same definition for the concepts student and planter, i.e.: In this second perspective, since salary and reward share payment as concept with maximum information content, whereas salary and college share only the root, as shown in Figure 2, we expect that: In the next section the similarity measure proposed in this paper is introduced in formal terms. It allows an enrichment of the traditional information-theoretic definition of semantic similarity, with the intended senses of the concepts according to a given application domain.

III. THE ENRICHED SEMANTIC SIMILARITY MEASURE
The information content approach was proposed in [38] as an alternative to the edge-counting method [37], whose drawback is the assumption that links in a taxonomy represent uniform distances. Indeed, the former is based on a probabilistic VOLUME 9, 2021 model that is not sensitive to the problem of link distances. Below, it is briefly recalled in formal terms.
Consider a set of concepts C of an ISA taxonomy (taxonomy for short), and a function p: such that, for any c ∈ C, p(c) is the probability of the concept c computed on the basis of the relative concept frequency, freq(c), evaluated from large collections of multidisciplinary texts, such as the Brown Corpus of American English [17]. In particular, the probability of a concept c is defined as: where N is the total number of concepts in the corpus.
According to [40], the information content of a concept c, indicated as IC(c), is computed as: which means that, intuitively, as the probability increases the informativeness decreases and, therefore, the more abstract a concept the lower its information content. Given two concepts c i , c j ∈ C, the notion of semantic similarity proposed by Resnik, sim R (c i , c j ), relies on the assumption that the more information two concepts share, the more similar they are, and is defined as follows: where S(c i , c j ) is the set of concepts that subsume (are more general of) both c i , c j . The concept corresponding to the maximum value above is referred to as the least common subsumer (lcs) (the most informative subsumer in [38]) of the concepts c i , c j . Therefore: and therefore: Successively, in [33] this notion was refined and, in particular, given two concepts c i , c j ∈ C, the concept semantic similarity proposed by Lin, sim L (c i , c j ), is defined as follows: where, with respect to the approach proposed by Resnik, the information contents of the compared concepts are both considered as an essential contribution in the evaluation of their semantic similarity. However, both the Resnik's and Lin's approaches, as well as the similarity methods originating from them that will be addressed in the experiment of Section IV, do not consider the semantic similarity of the meanings of concepts according to a given context. In this paper, an enrichment of the information content based methods is proposed by allowing to further characterize the meanings of the compared concepts with respect to a given application domain. Below it is illustrated by using the Lin's formula, but it can be applied to any information content measure, as shown below.
Suppose we have an application domain, say D k , the semantic similarity of the concepts c i , c j ∈ C, indicated as sim D k (c i , c j ), is defined as follows: where ω k is a weight, 0 ≤ ω k ≤ 1, defined by the domain expert according to D k , and S D k is a function from C to C, referred to as the intended sense function, associating a concept with its meaning according to D k , i.e.: and: The above formula can be rewritten and generalized by using any information content based semantic similarity measure, sim(c i ,c j ), as follows: where the weight ω k , depending on D k , allows a balance between the roles of the generic senses and the intended senses of the concepts, according to the relevance they have in the domain D k .
Note that, given an application domain D k , in this proposal both the weight ω k and the function S D k are defined according to domain expert judgments. However, as mentioned in Section VI, in future work we are planning to extend this approach to the framework of Linked Data [9], in order to support the domain expert not only in the evaluation of ω k , but also in the selection of the intended senses of concepts according to the addressed domain.

IV. EXPERIMENTAL RESULTS
As mentioned above, this paper focuses on semantic similarity of concepts organized according to an ISA taxonomy. For this reason, as also discussed in the Related Work Section, in this experiment all the methods for computing the more general notion of semantic relatedness, i.e., concerning non-taxonomic relations [29], have not been addressed. The same also holds for the similarity methods relying on, for instance, Wikipedia [23], since the automatic extraction of the ISA taxonomy from it requires additional ad-hoc algorithms and, therefore, a further level of correlation to be analyzed, that necessarily impacts on the overall evaluation of the methods and goes beyond the scope of this work [6].
It is important to remark that this is a novel approach for which, to our knowledge, there are no comparable proposals in the literature. Therefore, in order to validate it, in the experiment below additional assumptions are required on the addressed benchmark datasets. In fact, with respect to the traditional experimentations where a dataset composed of a set of pairs of concepts suffices, for this proposal we need further pairs of concepts, i.e. concept senses, representing contexts.
In order to arrange the experiment, we focused on the well-known Miller&Charles (M &C) dataset [35], which is still considered a reference for comparing semantic similarity methods [3] and, for each pair of concepts of this dataset, we considered all the pairs of concepts of the same dataset as possible contexts. With regard to the state-of-theart, we selected six information content based approaches, which are among the most significant methods for evaluating semantic similarity in a taxonomy, that are recalled in the Related Work Section. In particular, besides the Resnik (sim R ), and Lin (sim L ) milestones, we applied our proposal to the measures of Jiang and Conrath (sim J &C ) [27] In addition, also the Wu and Palmer method (sim W &P ) has been addressed [47], as representative of the edge-counting approach [37], that can be seen as a special case of [33].
Analogously to [33], let us consider 28 pairs of concepts of the M &C dataset, and the correlations with human judgment (HJ) of the seven methods above, that are shown in Table 1.
As mentioned above, the same dataset was addressed in order to associate each pair of the dataset with 28 possible application domains D k , k = 1 . . . 28, in the following referred to as contexts (therefore we evaluated 28 × 28 = 784 similarity scores). For instance, for the pair of concepts (journey, voyage), the 28 contexts are shown in Table 2, and are: We have seen in Section III that, in general, the intended senses of concepts are supposed to be estimated by domain experts, together with the related weight ω k in the given context D k , according to Formula (1). In this experiment, in order to quantify such a weight, which represents the relevance of a pair of senses with respect to the pair of contrasted concepts, we leveraged the existing methods for evaluating the semantic relatedness of concepts. In fact, for this purpose, we do not have to restrict our attention to concept similarity, but we also need to consider non-taxonomic relations, e.g., thematic relations [29]. Therefore, in the available literature, the method proposed in [42] has been selected because it exploits the large amounts of semantic relations encoded within DBpedia semantic network. 2 Furthermore, it achieves competitive performances in computing the semantic distances of concepts by relying on the information content approach. In particular, given a pair of concepts c i ,c j and a context D k , we assumed ω k as the average of the semantic relatedness of a concept of that pair with the corresponding concept of the associated pair of senses (S D k (c i ), S D k (c j )), i.e.: ω k = (r 1 + r 2 )/2 where r 1 = rel(c i , S D k (c i )) and r 2 = rel(c j , S D k (c j )), and rel is the relatedness degree computed according to [42]. For instance, for the pair of concepts (journey, voyage) consider the pair of senses (food, fruit), corresponding to the context D 10 in Table 2, i.e.: S D 10 (journey) = food S D 10 (voyage) = fruit.
The similarity values of Table 2 have been obtained by applying Formula (1) in Section III to the selected state-ofthe-art measures, as well as to the human judgment, as shown by the following example. For instance, in the case of the context D 10 , the measure proposed by Adhikari et al., sim A,D 10 , has been computed according to the values given in Table 1 In order to compute 28 tables, 4 one for each pair of the M &C dataset, each table containing 28 possible contexts for that pair, a disambiguation step has to be performed. In fact, it is well-known that in Wikipedia, and consequently in DBpedia, terms are addressed with the possible meanings they have, i.e., a term is associated with multiple senses. For this reason, in this experiment the disambiguation is necessary in order to address senses in line with the HJ evaluation in the M &C experiment. For instance, crane in Wikipedia has two main senses, that are bird and machine. Table 3 shows   3 In Tables 2, 4  the results concerning the average weights in the 28 contexts, ω avg , of crane before and after the disambiguation steps. In particular, when paired with implement and bird, it is disambiguated by using the senses machine and bird, respectively. Note that in the case of the pair (crane, implement), the average weight significantly increases (from 0.08 to 0.32) if crane stands for a machine, and implement stands for a tool. Analogously, for the pair (bird, crane), the average weight increases after disambiguating it with the bird sense.
In the next subsection, a data analysis concerning the senses of the concepts with respect to the concepts to be compared is performed.

A. RELEVANCE OF THE INTENDED CONCEPT SENSES
In the experiment, in associating a given pair of concepts with a pair of possible concept senses, in some cases the weigh ω k , for a given context D k , is null. Within these cases, there are some particular situations for which both the concept senses do not have any relevance with the concepts to be compared, i.e., both the values r 1 , r 2 above are null. In other words, for some pairs of concepts, there are contexts (or perspectives) that do not apply to both the compared concepts, i.e., they do not correspond to any specific point of view and, for this reason, in the experimentation these contexts have been ignored. This is for instance the case of the pair of concepts (coast, shore), when associated with the pairs of senses (brother, monk), or (boy, lad).
The same also holds in the case of concept senses with low similarity values, such as for instance the pair (noon, string), or (chord, smile). Therefore, in order to analyze significant contexts, a threshold for HJ in Table 1 has been introduced, in this case equal to 0.5 (in the scale from 0 to 4). It is important to observe that this threshold has been applied only in the case of concept senses, whereas the experiment concerns all the 28 pairs of concepts, including the ones with HJ less than 0.5. The correlations with HJ for the pair (journey, voyage) in the addressed contexts is shown in Table 2, whereas the average correlations for all the 28 pairs are shown in Table 4. Furthermore, in Table 5, the correlations of some specific pairs of concepts are illustrated, with the related average weights, starting from a pair of very similar concepts, such as (car, automobile), to the pair of concepts (journey, car) which are related but not similar [29]. This issue is also discussed in the next subsection.

1) RELIABILITY OF CONCEPT SENSES
It is interesting to observe how the correlation behaves if we further restrict our attention to pairs of concept senses which correspond to ''reliable'' contexts, as illustrated below. Consider again the pair of concepts (journey, voyage). Among the 28 contexts, including for instance (boy, lad), or (midday, noon) that have low relevance weights ω k , let us focus on the five contexts shown in Table 6, where the similarity values for the methods L, J &C, P&S, A, and A&M are given. In the table, besides the correlation obtained according to the data analysis illustrated above (Correl.), also the one obtained by applying only the disambiguation step, indicated as Correl d , is shown. Note that in the case of D 3 , the context does not provide any additional information about the intended senses of the concepts journey and voyage and, therefore, their similarity coincides with the one of their generic senses. In fact ω 3 = 1.00 and, for all the methods, the similarity values in Table 6 are the same of Table 1. Consider now the context D 20 , where the concept journey stands for a trip up the coast, whereas voyage is a travel through the hill. In this case similarity always decreases, since for all the methods the intended senses of journey and voyage have similarity values less than the ones computed by addressing their generic senses (see Table 1). This is more evident if we consider the context D 23 , which associates with journey the same meaning it has in D 20 , i.e., coast, whereas voyage stands for a trip in the forest. This is not the case of the context D 5 , where the intended meaning of voyage is a trip along the shore. In fact, the similarity of journey and voyage increases except for the methods A and A&M , as expected according to the corresponding values of the related senses.
It is interesting to observe the case of the context D 17 , where the intended senses for (journey, voyage) are represented by the pair (journey, car). For all the methods the similarity values of the contrasted concepts considerably decrease, although the weight w 5 is high (0.75). Indeed, this VOLUME 9, 2021  result is expected due to the semantically different kind of relation the concepts journey and car have, since they are related concepts, linked by a thematic relation, that are not considered similar [29]. In fact, it is important to observe that, in Table 1, according to L, A, and A&M , the similarity values of journey and car are null, whereas this does not hold for the methods J &C and P&S. Overall, in Table 6, with respect to Correl d , the methods L, A, and A&M show correlation values slightly better than J &C, and P&S (0.98 vs 0.97), with an increase of the average weights (0.55 vs 0.52). On the basis of these results, a further investigation about the impact of non-taxonomic relations on this proposal may be worthwhile.
The methods addressed in this experimentation are better illustrated in the Related Work Section below.

V. RELATED WORK
In the literature, there is a significant amount of works addressing semantic similarity [11], [12]. With the advent of Wikipedia, the most widely used and up-to-date knowledge repository, several approaches have been proposed by exploiting its features, such as articles, hyperlinks, categories, etc.. (see for instance [23], [26], [28], [32]). As mentioned above, semantic similarity is a special case of semantic relatedness [29]. The latter also concerns thematic relations (and, more in general, non-taxonomic relations) and, in the literature, there are several proposals investigating it, by relying on general purpose knowledge resources, such as Wikipedia or WordNet [5], [21], [22], [50]. However, all the approaches in the mentioned literature do not address the intended senses of a pair of concepts in a given application domain, or context, but they consider all the combinations of all possible senses for that pair, and then select the highest values. For this reason, as also mentioned in the previous section, in order to compare this proposal with the mentioned state-of-the-art, not only ad-hoc algorithms for the automatic extraction of the ISA taxonomy are needed, but also for the identification and the extraction of the concept senses in the given application domains. Therefore, in order to have reliable benchmark datasets, further parameters have to necessarily be investigated [6], whose analysis goes beyond the scope of this paper.
Note that an interesting approach for determining semantic contexts has been proposed in [18], and is based on the Heuristic Semantic Walk (HSW). In the mentioned paper, in order to define contexts, the information collected on the most traversed paths in Collaborative Semantic Networks is used, and the related method has been experimented in Wikipedia. However, with respect to this work, here we propose an enrichment of the notion of semantic similarity in a taxonomy on the basis of a given context, whereas the mentioned approach aims at identifying the context of two or more concepts by relying on a proximity measure.
With regard to semantic relatedness, we remark that in this proposal it has been addressed in the experiment for the different purpose of evaluating ω k , i.e., the relevance of the intended sense of a concept in a domain D k . As mentioned above, among the existing proposals, the method defined in [42] has been selected since it shows competitive performances by relying on the information content approach.
Within the semantic similarity approaches, as for instance the one recently presented in [48] for Neural Networks, or hybrid similarity measures combining the shortest path lengths and the depths of subsumers [31], below we restrict our attention to the methods based on the information content (IC) approach, which has been employed in different research areas, such as Natural Language Processing [4], Semantic Web [14], [34], Formal Concept Analysis [13], [44], IFCA (Formal Concept Analysis with Interval Type-2 Fuzzy sets) [15], Geographical Information Systems [16], [41], and different application domains, such as health [1], [25], and network security [45], just to mention a few examples. However the IC approach, although recognized as ''the state of the art on semantic similarity'' [3], [8], has shown some limitations, as discussed below. In the following, we start by recalling the IC based approaches addressed in the Experimental Results Section.
With regard to the works of Resnik [38] and Lin [33] (R, and L, respectively, in the tables above), which have been analyzed in details in Section III, we briefly summarize the following. According to the former, concept similarity in a taxonomy is computed by considering only concept commonalities (i.e., concept least common subsumers). Therefore, it shows some limitations since pairs of concepts having the same least common subsumers have the same similarity degrees. The latter, according to [10], can be reconducted to the well-known Tversky linear contrast model of similarity [43], which addresses both concept commonalities and differences. In particular, also in [33] the importance of observing an object from different perspectives is emphasized, but the proposed resulting similarity degrees are considered as weighted averages of the similarity values obtained from such perspectives. As a result, this approach does not allow to estimate concept similarity by considering a single specific perspective at a time. Successively, in the late 1990s, in [27] a proposal combining the IC with the edge-counting approach has been presented (J &C in our experiment), showing better performances with respect to the previous methods. However, one objection to the early IC based measures relies on the use of large-scale corpora [3], [7], [8], [24], [49]. In fact, evaluating the IC on the basis of statistical information taken from textual corpora requires a huge amount of manual effort at level of both design and maintenance of the corpus. For this reason, in the literature, an evolution of the IC notion has been extensively investigated, referred to as intrinsic information content (IIC), although there is a lack of a statistically significant difference between the performances of the IIC models and the corpus-based ones [30]. In particular, the IIC is evaluated independently of textual corpora, and in accordance with the intrinsic structure of the taxonomy, i.e., on the basis of the number of hyponyms and/or hypernyms of the concepts. Along this direction, Adhikari et Al. propose a method in [3] (A in our experiment), arguing that by relying only on the maximum among the ICs of the least common subsumers leads to ignore some common subsumers that can be relevant in order to evaluate semantic similarity. For this reason, in the mentioned paper, the IC is estimated according to an IIC approach by introducing a new notion, referred to as Disjoint Common Subsumers. A variant of this approach based on Meng model has also been proposed in [2], that shows slightly better performances with respect to the other measure (A&M in our experiment). Both the models they present achieve high correlation values when applied to the state-of-the-art measures addressed in our experiment. Finally, in [36] (P&S in our experiment), an IIC approach for semantic similarity has been proposed by relying on Tversky contrast model, that shows high correlation with human judgment with respect to the state-of-the-art. As illustrated above, this measure also shows high correlation values in our experiment, although the impact of non-taxonomic relations on our proposal should be better investigated. Besides the methods addressed in our experimentation, it is worth mentioning that in [8] the IIC notion is revised by using not only concept hyponyms and hypernyms, but also leveraging synonymy and polysemy in WordNet. In [7], [49], the authors claim that most of IC computing models have been developed for single-parent taxonomies, therefore they propose a new IIC computing model in the presence of multiple inheritance hierarchies, such as in WordNet.
With respect to all the aforementioned works, in this paper we do not present a new IC computing model, and our proposal is independent of the IIC models recalled above. In fact, although these approaches show a high accuracy in the similarity evaluation, they do not involve concept meaning and, in particular, the related similarity measures do not address the intended senses of concepts according to a given application domain.
Note that the semantic similarity measure proposed in [24] originates from the need to overcome one of the limitations we highlighted in this paper, i.e., that pairs of concepts sharing the same least common subsumers have the same similarity degrees. However, the authors base their solution on the whole WordNet ontology, by associating the different kinds of relationships (e.g., ISA and PartOf) with different weights, which is again a proposal independent of the concept intended senses.
The notion of sense has been addressed by Resnik in [39], where semantic similarity is used to identify and select the appropriate sense of a concept when it appears in a group of related terms. Analogously, in [19] the semantic similarity of Lin and the MeSH thesaurus have been employed in order to determine the adequate sense of an ambiguous biomedical term. However, both these papers address word sense disambiguation in the field of computational linguistics, where semantic similarity is not the objective of the works but is used in order to associate a noun with the right sense in a given context. On the contrary, we use the concept intended senses to improve the computation of semantic similarity. Finally, senses are also addressed in [20], where concept similarity is computed between the most-related pairs among the concept corresponding meanings, but the intended senses of the compared concepts are not considered.

VI. CONCLUSION AND FUTURE WORK
In this work an enrichment of the information-theoretic definition of semantic similarity has been presented, for concepts organized according to a ISA taxonomy. The proposed measure is based on a novel approach that essentially addresses the context of the contrasted concepts, by associating them with their intended senses. In this way, concept similarity scores can be refined and made closer to the specificity of the given application domain, and the related purpose. This proposal has been applied to some among the most relevant state-of-the-art similarity measures, and the results show that it achieves high correlation values with human judgment in line with the results presented in the literature for the specific methods.
Regarding future work, this proposal can be placed within the more general framework of Linked Data [9]. The idea is to consider, for instance, the DBpedia knowledge graph and use the rdf triples as all possible senses of concepts. In particular a concept (node) of the rdf graph can be associated with as many senses as the number of rdf triples in which it appears as subject. Therefore, given an application domain, this graph could support the domain expert in selecting the intended sense of a concept according to the addressed domain.
Furthermore, besides an analysis about the impact of non-taxonomic relations on the proposed approach, we plan to refine this measure by defining the intended sense of a concept as a set of concepts, rather than a single one, in order to better characterize the concept meaning with respect to a given context. In addition, each concept belonging to such a set will be associated with a degree of accuracy representing how much it describes the related concept in the given domain. Therefore, a method for computing semantic similarity between sets of concepts will be investigated.