By Topic

Semantics of Ranking Queries for Probabilistic Data

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Jestes, J. ; Florida State University, Tallahassee ; Cormode, G. ; Feifei Li ; Ke Yi

When dealing with massive quantities of data, top-k queries are a powerful technique for returning only the k most relevant tuples for inspection. There have been several recent attempts to propose definitions and algorithms for ranking queries over probabilistic data. However, these all lack many of the intuitive properties of a top-k over deterministic data. Our observation is that the ranks for a tuple across all possible worlds represent a well-founded distribution of its ranks and this distribution forms the basis of our ranking definition. We studied the ranking definitions based on the expectation, the median and other order statistics of this rank distribution for a tuple and derived the expected rank, median rank and quantile rank correspondingly. We provide efficient solutions to compute such rankings across the major models of uncertain data, such as attribute-level and tuple-level uncertainty. For an uncertain relation of N constant-size tuples, the processing cost for expected rank is O(NlogN)—no worse than simply sorting the relation. The costs for median and quantile ranks are higher, due to dynamic programming. Nevertheless, it is still possible to compute them in low polynomial time. Furthermore, in most cases, we provide pruning techniques that can terminate the search early and guarantee that the top-k has been found.

Published in:

Knowledge and Data Engineering, IEEE Transactions on  (Volume:PP ,  Issue: 99 )