Skip to Main Content
We address the task of learning a semantic segmentation from weakly supervised data. Our aim is to devise a system that predicts an object label for each pixel by making use of only image level labels during training - the information whether a certain object is present or not in the image. Such coarse tagging of images is faster and easier to obtain as opposed to the tedious task of pixelwise labeling required in state of the art systems. We cast this task naturally as a multiple instance learning (MIL) problem. We use Semantic Texton Forest (STF) as the basic framework and extend it for the MIL setting. We make use of multitask learning (MTL) to regularize our solution. Here, an external task of geometric context estimation is used to improve on the task of semantic segmentation. We report experimental results on the MSRC21 and the very challenging VOC2007 datasets. On MSRC21 dataset we are able, by using 276 weakly labeled images, to achieve the performance of a supervised STF trained on pixelwise labeled training set of 56 images, which is a significant reduction in supervision needed.