Measuring image similarity is an important task for various multimedia applications. Similarity can be defined at two levels: at the syntactic (lower, context-free) level and at the semantic (higher, contextual) level. As long as one deals with the syntactic level, defining and measuring similarity is a relatively straightforward task, but as soon as one starts dealing with the semantic similarity, the task becomes very difficult. We examine the use of very simple syntactic image features combined with other multimodal features to derive a similarity measure that captures the weak semantics of an image. We test and further use this similarity measure to do video retrieval.