By Topic

Clustering Heterogeneous Web Data using Clustering by Compression. Cluster Validity

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Cernian, A. ; Fac. of Autom. Control & Comput. Sci., Politeh. Univ. of Bucharest, Bucharest, Romania ; Carstoiu, D. ; Olteanu, A.

The expansive nature of the Internet produced a vast quantity of unstructured data, compared to our conception of a conventional data base. The application of clustering on the World Wide Web is essential to get structured information from this sea of information. In this paper, we intend to test the results of a new clustering technique - clustering by compression - when applied to heterogeneous sets of data. The clustering by compression procedure is based on a parameter-free, universal, similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pair-wise concatenation). In order to validate the results, we calculate some quality indices. If the values we obtain prove a high quality of the clustering, in the near future we plan to include the clustering by compression technique into a framework for clustering heterogeneous Web objects.

Published in:

Symbolic and Numeric Algorithms for Scientific Computing, 2008. SYNASC '08. 10th International Symposium on

Date of Conference:

26-29 Sept. 2008