The paper tackles the problem of mining linked open data. The inherent lack of knowledge caused by the open-world assumption made on the semantic of the data model determines an abundance of data of uncertain classification. We present a semi-supervised machine learning approach. Specifically a self-training strategy is adopted which iteratively uses labeled instances to predict a label also for unlabeled instances. The approach is empirically evaluated with an extensive experimentation involving several different algorithms demonstrating the added value yielded by a semi-supervised approach over standard supervised methods.
Published in:
Semantic Computing (ICSC), 2012 IEEE Sixth International Conference on
Date of Conference: 19-21 Sept. 2012