Skip to Main Content
The exponential data growth rate of the Internet makes it increasingly difficult for people to find desired information in a timely fashion. Information filtering and dissemination systems allow users to register persistent queries called user profiles, and notify users when relevant files become available. Existing such systems, however, either are not scalable, or do not support matching of unstructured documents (e.g., text, HTML, image, audio or video files) that account for a significant percentage of Internet contents. We propose pFilter a global-scale, decentralized information filtering and dissemination system for unstructured documents. To handle potentially billions of documents for millions of subscribers, pFilter connects a large number of computers into a structured peer-to-peer overlay network. Computers in the overlay collectively publish or collect documents, build indices, register profiles, filter and disseminate documents. Profiles and documents are distributed through the network according to their semantics such that they can be matched efficiently and accurately without excessive flooding. pFilter employs scalable application-level multicast to deliver matching documents to a large number of interested parties efficiently.