By Topic

Rate-distortion approach to databases: storage and content-based retrieval

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Tuncel, E. ; Dept. of Electr. Eng., Univ. of California, Riverside, CA, USA ; Koulgi, P. ; Rose, K.

This paper investigates the relationship between rate-distortion theory and efficient content-based data retrieval from high-dimensional databases. We consider database design as the encoding of a data object sequence, and retrieval from the database as the decoding of the sequence using side information (i.e., the query) available only at the decoder. We show that, in this setting, the optimal asymptotic tradeoff between the search time Rs (bits per data object read from the storage device) and the expected search accuracy Ds (relevance of the retrieved data set) is given by the Wyner-Ziv solution with a side-information-dependent distortion measure. Moreover, the data indexing and retrieval problem is, in general, inseparable from the data compression problem. Data items selected by the search procedure, which can be stored in the disk with a limited total rate of Rr ≥ Rs, need to be presented at a prescribed expected reconstruction quality Dr. This is, hence, a problem of scalable source coding or successive refinement, albeit with differing layer distortion measures to quantify search and reconstruction quality, respectively. We derive a single-letter characterization of all achievable quadruples {Rs,Rr,Ds,Dr}, and prove conditions for "successive refinability" without rate loss. Finally, we show that the special case Ds=Dr=0 is nontrivial and of practical interest in this context, as it can impose "acceptable" search and reconstruction qualities for each individual data item and for the entire query space with high probability, in contradistinction with standard average distortion requirements. The region of achievable {Rs,Rr} is obtained by adapting Rimoldi's characterization to a new regular scalable coding problem.

Published in:

Information Theory, IEEE Transactions on  (Volume:50 ,  Issue: 6 )