Skip to Main Content
A large number of online databases are hidden behind form-like interfaces which allow users to execute search queries by specifying selection conditions in the interface. Most of these interfaces return restricted answers (e.g., only top-k of the selected tuples), while many of them also accompany each answer with the COUNT of the selected tuples. In this paper, we propose techniques which leverage the COUNT information to efficiently acquire unbiased samples of the hidden database. We also discuss variants for interfaces which do not provide COUNT information. We conduct extensive experiments to illustrate the efficiency and accuracy of our techniques.
Date of Conference: March 29 2009-April 2 2009