Skip to Main Content
Looks at the custom tool developed by the author that leverages the Google Web search API (or a similar search service) to discover a list of Web pages matching a given topic; identify and extract trends and patterns from these Web pages' text; and transform those trends and patterns into an understandable, useful, and well-organized information resource. The tool accomplishes these tasks using four main components. First, a search engine client discovers a list of relevant Web pages using the Google Web search API. An information extraction engine then mines concepts and associated text passages from these Web pages. Next, a clustering engine organizes the most significant concepts into a hierarchical taxonomy. Finally, a knowledge base generator uses this taxonomy to generate a hypertext knowledge base from the extracted concepts and text passages.