Many Web sites contain large sets of pages generated dynamically using a common template. The structured data extracted from these pages with semantic annotation are valuable for information system. We proposed a system, ADeaD, to automatically extract data values from these Web pages and annotate the data schema. Experimental evaluation on a lot of real Web page collections indicates our algorithm correctly extracted data and annotated the data schema.
Published in:
e-Technology, e-Commerce and e-Service, 2004. EEE '04. 2004 IEEE International Conference on
Date of Conference: 28-31 March 2004