We propose to use the similarity between the sample instance and a number of exemplars as features in visual object detection. Concepts from multiple-kernel learning and multiple-instance learning are incorporated into our scheme at the feature level by properly calculating the similarity. The similarity between two instances can be measured by various metrics and by using the information from various sources, which mimics the use of multiple kernels for kernel machines. Pooling of the similarity values from multiple instances of an object part is introduced to cope with alignment inaccuracy between object instances. To deal with the high dimensionality of the multiple-kernel multiple-instance similarity feature, we propose a forward feature-selection technique and a coarse-to-fine learning scheme to find a set of good exemplars, hence we can produce an efficient classifier while maintaining a good performance. Both the feature and the learning technique have interesting properties. We demonstrate the performance of our method using both synthetic data and real-world visual object detection data sets.