Skip to Main Content
We investigate several problems in the annotation of video shots by semantic labels which are implicitly embedded in a semantic hierarchy, leading to analyses and novel methods for refining video ontologies and their ground truth. First, in the large 449 LSCOM semantic concept data set, we show that within the implicit ldquouse ontologyrdquo, many concepts tags are ambiguous as to purposeful activity, visual scope, or social agency, or are absent altogether, but that better ldquouse sensesrdquo can be refined algorithmically. Second, we find that both traditional hard and fuzzy k-medoid clustering techniques are inadequate for hierarchical concepts, but a novel ldquofirm k-medoidrdquo clustering method both separates clusters and distributes superconcepts equitably. Third, we show how the scores of SVM semantic filters can be more reliably and quickly converted to probabilities by using a closed-form approximation to SVM behavior between its margins. Fourth, we show that the quality of SVM semantic filters for hierarchical concepts can be analyzed by their ability to separate their positive ground truth examples from those of any other concept in the hierarchy; the most discriminating are those with ground truth showing distinctive physical backgrounds.