Skip to Main Content
Despite the Web's current disorganized and anarchic state, many AI researchers believe that it will become the world's largest knowledge base. We examine a line of research whose final goal is to make disparate data sources work together to better serve users' information needs. This work is known as information integration. The authors talk about its application to datasets made available over the Web. A. Levy discusses the relationship between information-integration and traditional database systems. He then enumerates important issues in the field and demonstrates how the Information Manifold project has addressed some of these. C. Knoblock and S. Minton describe the Ariadne system. Two of its distinguishing features are its use of wrapper algorithms to extract structured information from semistructured data sources and its use of planning algorithms to determine how to integrate information efficiently and effectively across sources. W. Cohen describes an interesting variation on the theme, focusing on "informal" information integration. The idea is that, as in related fields that deal with uncertain and incomplete information, an information-integration system should be allowed to take chances and make mistakes. His Whirl system uses information-retrieval algorithms to find approximate matches between different databases, and as a consequence knits together data from quite diverse sources.