Skip to Main Content
Web structure mining is the process of using graph theory to analyse the node and connection structure of a web site. A large number of web pages contain data structure in the form of “lists”. Many such lists can be further split into multicolumn tables, which can then be used in more semantically meaningful tasks. Returning relational tables from such lists is a challenging task. The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. Because each relational table has its own “schema” of labelled and typed columns, each such table can be considered as a small structured database. This research paper first focus on the new techniques for keyword search over a mass of tables, and show that they can achieve substantially higher relevance than solutions based on a traditional search engine using Referenced attribute Functional Dependency Database (RFDDb). The Second objective is to introduce the data structure called BOOKSHELF that records corpus-wide statistics on co-occurrences of schema elements. In addition to improving search relevance in bookshelf data structure several novel applications such as: schema auto-complete, which helps a database designer to choose schema elements and attribute synonym finding, which automatically computes attribute synonym pairs for schema matching are incorporated. Referential Integrity requires that the columns of a foreign key must match in number and type the columns of the primary key in the referenced table. The values of the foreign key columns in each row of the referencing table must match the values of the corresponding primary key columns for a row in the referenced table. A functional dependency occurs when one attribute in a relation uniquely determines another attribute. This can be written A ->; B which would be the same as stating “B is functionally dependent upon A.” Finally, the search- - results are presented in visual mode, which allows a user to navigate between extracted schemas.
Electronics Computer Technology (ICECT), 2011 3rd International Conference on (Volume:6 )
Date of Conference: 8-10 April 2011