Skip to Main Content
XML documents allow document owners to create their documents in their own formats (element names and structure). The same information might be described in several ways. To find the similarity between XML documents which use different formats, one method is path similarity detection. The recently approach is PathSim. The previous approach can detect the similarity rate between two XML paths in case those two XML paths are in the same hierarchy order of elements that are semantic with each other. If XML documents use different elements hierarchy order, that approach has a low similarity rate. To improve the previous approach, PathMatch is introduced. PathMatch uses the edit distance algorithm to find the semantic similarity rate between element names and a cost matrix model to find the similarity rate between two XML paths. On the result, PathMatch has a higher similarity rate than the previous approach in case those two XML paths contain a different hierarchy order of elements that are semantic with each other. Moreover, in case two XML paths contain the same hierarchy order, PathMatch has the same similarity rate as the previous one.