Abstract:
The scarcity of training data restricts a neural network from capturing schema diversity and intricacies, hindering schema-matching models' generalization capabilities. I...Show MoreMetadata
Abstract:
The scarcity of training data restricts a neural network from capturing schema diversity and intricacies, hindering schema-matching models' generalization capabilities. In this paper, we propose ISResMat, a framework specifically designed to match the schemas of relational tables by fine-tuning a pre-trained language model. We first offer a training data construction method, Pairwise Sampling, which could generate the training dataset with table data. Next, we design two loss functions (i.e., Meta-Matching Loss and Agent-Delegating Loss) to learn representations of table columns. With those representations, we could calculate matching scores between different table columns for deducing the matching candidates, which provides a novel approach to schema matching. Finally, we present two optimizations (i.e., Matching Rectification Loss and Distribution-Aware Fingerprint) to handle the problems of matching cardinality constraints and numerical columns, respectively. ISResMat is a flexible framework supporting instance-based, schema-based, and hybrid matching without significant modification. Experiments on 500+ fabricated and human-curated relation pairs spanning diverse domains and matching scenarios showcase that our approach outperforms existing state-of-the-art methods.
Date of Conference: 13-16 May 2024
Date Added to IEEE Xplore: 23 July 2024
ISBN Information: