In order to evaluate information retrieval algorithms it is imperative to use a dataset as a test database. However, access to such datasets is often difficult and expensive, since building them is a time-consuming and costly task. This paper presents a collaborative approach to dataset creation that uses a data quality evaluation technique based on fuzzy theory, to assist users in selecting suitable Web documents for their datasets. These documents are automatically captured by a crawler and assessed on information derived from their metadata.
Published in:
Computer Supported Cooperative Work in Design, 2009. CSCWD 2009. 13th International Conference on
Date of Conference: 22-24 April 2009