Skip to Main Content
Today in many domains there are very limited explicit ontologies established for building information systems. The information systems have only schemas for their information repositories which to some extent imply the semantics of the information. Traditional ontology-driven semantic integration approaches cannot be directly applied in integrating these information systems. In our work we use the schemas and data instances of the information repositories to discover semantic correspondences between the schema elements and build a domain ontological view. We apply the hierarchical clustering technique on the data instances and use the clusters in the further analysis to reduce the cost of processing a large amount of data. The matching of schema elements is based on the probability distribution of the data instances. The preliminary results have demonstrated the effectiveness of this approach.