By Topic

A novel ensemble vision based deep web data extraction technique for web mining applications

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Banu, B.A. ; Dept. of Comput. Sci. & Eng., Mohamed Sathak Eng. Coll., Kilakarai, India ; Chitra, M.

Web Content extraction is the task of extracting structured information from unstructured and semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images and audio, video could be seen as information extraction. Similarly, information retrieval is the process which is based on user's query. The retrieved information is to be extracted using the web content extraction concept. The Challenges for this type of web page content extraction is increasing now-a-days. In this work, we study the problem of automatically extracting the contents from the web pages. Many more researches have been done to address this problem. The existing approaches have some limitations such as that, it has no sufficient power to deal with the large number of web pages and also that they are web-page-programming- language(HTML) dependent. Our proposed work is to overcome the limitations of the existing system. This work deals with information retrieval process in which the Vision based approach is applied, which helps to extract both images and text from the web pages. In fact most of researches show that when a page is presented to the user, the spatial and visual features play a very important role because they help the user to unconsciously divide the webpage into several semantic parts. Hence, proposed work focus on the primary visual features of a web page. The extraction is carried out on the basis of these features. This approach can gain a better performance when compared with other traditional methods.

Published in:

Advanced Communication Control and Computing Technologies (ICACCCT), 2012 IEEE International Conference on

Date of Conference:

23-25 Aug. 2012