By Topic

A Disc-based Approach to Data Summarization and Privacy Preservation

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Rong Ge ; Simon Fraser University ; M. Ester ; Wen Jin ; Zengjian Hu

Data summarization has been recognized as a fundamental operation in database systems and data mining with important applications such as data compression and privacy preservation. While the existing methods such as CF-values and DataBubbles may perform reasonably well, they cannot provide any guarantees on the quality of their results. In this paper, we introduce a summarization approach for numerical data based on discs formalizing the notion of quality. Our objective is to find a minimal set of discs, i.e. spheres satisfying a radius and a significance constraint, covering the given dataset. Since the proposed problem is NP-complete, we design two different approximation algorithms. These algorithms have a quality guarantee, but they do not scale well to large databases. However, the machinery from approximation algorithms allows a precise characterization of a further, heuristic algorithm. This heuristic, efficient algorithm exploits multi-dimensional index structures and can be well-integrated with database systems. The experiments show that our heuristic algorithm generates summaries that outperform the state-of-the-art data bubbles in terms of internal measures as well as in terms of external measures when using the data summaries as input for clustering methods

Published in:

18th International Conference on Scientific and Statistical Database Management (SSDBM'06)

Date of Conference:

0-0 0