By Topic

Bucketing Coding and Information Theory for the Statistical High-Dimensional Nearest-Neighbor Problem

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Dubiner, M. ; Google, Inc., Mountain View, CA, USA

The problem of finding high-dimensional approximate nearest neighbors is considered when the data is generated by some known probabilistic model. A large natural class of algorithms (bucketing codes) is investigated, Bucketing information is defined, and is proven to bound the performance of all bucketing codes. The bucketing information bound is asymptotically attained by some randomly constructed bucketing codes. The example of n Bernoulli(1/2) very long (length d → ∞) sequences of bits is singled out. It is assumed that n - 2m sequences are completely independent, while the remaining 2m sequences are composed of m dependent pairs. The interdependence within each pair is that their bits agree with probability 1/2 <; p ≤ 1. It is well known how to find most pairs with high probability by performing order of nlog22/p comparisons. It is shown that order of n1/p+∈comparisons suffice, for any ∈ > 0. A specific 2-D inequality (proven in another paper) implies that the exponent 1/p cannot be lowered. Moreover, if one sequence out of each pair belongs to a known set of n(2p-1)2 sequences, pairing can be done using order n1+∈ comparisons!

Published in:

Information Theory, IEEE Transactions on  (Volume:56 ,  Issue: 8 )