Home  |   Login  |   Logout  |   Access Information  |   Alerts  |   Purchase History  |   Cart  |   Sitemap  |   Help   
 
Abstract
BROWSE SEARCH IEEE XPLORE GUIDE SUPPORT
arrow_leftView TOC   |arrow_leftPrevious Article   |  Next Articlearrow_right
Email/Printer Friendly Format  
 

A quality-threshold data summarization algorithm
Viet Ha-Thuc   Duc-Cuong Nguyen   Srinivasan, P.  
Comput. Sci. Dept., Univ. of Iowa, Iowa City, IA;

This paper appears in: Research, Innovation and Vision for the Future, 2008. RIVF 2008. IEEE International Conference on
Publication Date: 13-17 July 2008
On page(s): 240-246
Location: Ho Chi Minh City,
ISBN: 978-1-4244-2379-8
INSPEC Accession Number: 10156877
Digital Object Identifier: 10.1109/RIVF.2008.4586362
Current Version Published: 2008-08-08

Abstract
As database sizes increase, semantic data summarization techniques have been developed, so that data mining algorithms can be run on the summarized set for the sake of efficiency. Clustering algorithms such as K-Means have popularly been used as semantic summarization methods where cluster centers become the summarized set. The goal of semantic summarization is to provide a summarized view of the original dataset such that the summarization ratio is maximized while the error (i.e., information loss) is minimized. This paper presents a new clustering-based data summarization algorithm, in which the quality of the summarized set can be controlled. The algorithm partitions a dataset into a number of clusters until the distortion of each cluster is less than a given threshold, thus guaranteeing the summarized set has less than a fixed amount of information loss. Based on the threshold, the number of clusters is automatically determined. The proposed algorithm, unlike traditional K-Means, adjusts initial centers based on the information about the data space discovered so far, thus significantly alleviating the local optimum effect. Our experiments show that our algorithm generates higher quality clusters than K-Means does and it also guarantees an error bound, an essential criterion for data summarization.

Index Terms
Available to subscribers and IEEE members.

References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.
You are not logged in.
Guests may access Abstract records free of charge.
Login
Username
Password
» Forgot your password?
Please remember to log out when you have finished your session.
You must log in to access:
• Advanced or Author Search
• CrossRef Search
• AbstractPlus Records
• Full Text PDF
• Full Text HTML
Access this document
Full Text: PDF (326 KB)
» Buy this document now
»  Learn more about
»  Learn more about
    purchasing articles
    and standards

Rights and Permissions
» Learn More
Download this citation
Available to subscribers and IEEE members.
 
arrow_leftView TOC   |arrow_leftPrevious Article   |  Next Articlearrow_right   |  Back to toparrow_up
Indexed by IEE Inspec
© Copyright 2009 IEEE – All Rights Reserved