Skip to Main Content
A flood of information is hidden behind form-like interface which makes it difficult to capture the characteristics of the databases, such as the topic and the frequency of updates. This poses a great challenge for hidden web data integration. HIDDEN-DB-SAMPLER is the first algorithm to address this problem, but it does not consider the keyword attributes on the query interface. This paper presents increment-based random walk, a new technique applicable to any kind of attributes. The main idea of this approach is for keyword attributes, it incrementally obtains new values from a database. That is, select a value from the current sample and submit it to the interface, the selection scheme is designed to ensure the quality of the sampling; for other attributes, it works as RANDOM WALK does. An extensive set of experimental results demonstrates the accuracy and efficiency of our technique.