Skip to Main Content
We develop a new approach for text document filtering based on automatic construction of filtering profiles using Bayesian inference network learning. Bayesian inference networks, based on probability theory, offer a suitable framework to harness the uncertainty found in the nature of the filtering problem. In order to learn the networks effectively, we explore three different techniques for discretization. Good features of high predictive power are automatically obtained from the training document content. Our approach does not need to know in advance the subject or content of documents as well as the information needs expressed as topics. A series of experiments on a set of topics were conducted on two large-scale real-world document corpora. The empirical results demonstrate that our Bayesian inference network learning with advanced discretization achieves better performance over the simple naive Bayesian approach.