Skip to Main Content
In this paper we compare a variety of unsupervised probabilistic models used to represent a data set consisting of textual and image information. We show that those based on latent Dirichlet allocation (LDA) out perform traditional mixture models in likelihood comparison. The data set is taken from radiology; a combination of medical images and consultants reports. The task of learning to classify individual tissue, or disease types, requires expert hand labeled data. This is both: expensive to produce and prone to inconsistencies in labeling. Here we present methods that require no hand labeling and also automatically discover sub-types of disease. The learnt models can be used for both prediction and classification of new unseen data.