The video signature method has previously been proposed as a technique to summarize video efficiently for visual similarity measurements (see Cheung, S.-C. and Zakhor, A., Proc. SPIE, vol.3964, p.34-6, 2000; ICIP2000, vol.1, p.85-9, 2000; ICIP2001, vol.1, p.649-52, 2001). We now develop the necessary theoretical framework to analyze this method. We define our target video similarity measure based on the fraction of similar clusters shared between two video sequences. This measure is too computationally complex to be deployed in database applications. By considering this measure geometrically on the image feature space, we find that it can be approximated by the volume of the intersection between Voronoi cells of similar clusters. In the video signature method, sampling is used to estimate this volume. By choosing an appropriate distribution to generate samples, and ranking the samples based upon their distances to the boundary between Voronoi cells, we demonstrate that our target measure can be well approximated by the video signature method. Experimental results on a large dataset of Web video and a set of MPEG-7 test sequences with artificially generated similar versions are used to demonstrate the retrieval performance of our proposed techniques.