Skip to Main Content
Efficient high-dimensional data indexing algorithms are crucial for image retrieval in large datasets. One of the state-of-the-art indexing methods is vector approximation file (VA-file), which indexes high-dimensional data by filtering feature vectors so that only a small fraction of them are visited in the search process. The VA-file uses a partition strategy that divides the data space on every dimension to make each partition equally full and assigns a same number of bits to each dimension. However, the strategy is not efficient to image datasets where the number of different vector components (granularity) in each dimension is largely diverse. The first two partition strategies are implemented in a practical way according to the description from the original VA-file method. The other two nonuniform partition strategies are proposed to resolve the problems of reduplicate coordinates and uniform bits assignment for each dimension, which assign more bits to represent dimensions with more vector components. Experimental results have shown that these strategies largely improve the performance of the VA-file for nonuniform datasets in terms of query time and filtering efficiency.