Skip to Main Content
A new method for web data collection for pervasive computing is proposed by this paper. With the fast expansion of World Wide Web, dynamic web pages become more important. They are usually generated from a database through a common template. The structured data extracted from these pages with semantic annotation are valuable for information system. In this paper, we study how to label attribute on data value, to automatically detect the template behind these pages and extract embedded data. To label attribute on data value, we rely on the fact that the label text is visually closed to the data element. And we propose a bootstrapping method for learning label. A novel algorithm is presented to detect template and construct wrapper. Experimental results obtained using a large number of pages show that the proposed technique is highly effective.