Cart (Loading....) | Create Account
Close category search window
 

Endless and Scalable Knowledge Table Extraction from Semi-structured Websites

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

The purchase and pricing options are temporarily unavailable. Please try again later.
4 Author(s)
Yingqin Gu ; Key Labs. of Data Eng. & Knowledge Eng., Renmin Univ. of China, Beijing, China ; Lei Ji ; Ziheng Jiang ; Jun He

The problem of scalable knowledge extraction from the Web has attracted much attention in the past decade. However, it is under explored how to extract the structured knowledge from semi-structured Websites in a fully automatic and scalable way. In this work, we define the table-formatted structured data with clear schema as Knowledge Tables and propose a scalable learning system, which is named as Kable to extract knowledge from semi-structured Websites automatically in a never ending and scalable way. Kable consists of two major components, which are auto wrapper induction and schema matching respectively. In contrast to the state of the art auto wrappers for semi-structured Web sites, our adopted approach can run around 1'000 times faster, which makes the Web scale knowledge extraction possible. On the other hand, we propose a novel schema matching solution which can work effectively on the auto-extracted structured data. With 3 months' continuous run using ten Web servers, we successfully extracted 427,105,009 knowledge facts. The manual labeling over sampled knowledge extracted show the up to 87% precision for supporting various Web applications.

Published in:

Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on

Date of Conference:

10-10 Dec. 2012

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.