By Topic

A General Framework for Fast Co-clustering on Large Datasets Using Matrix Decomposition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Feng Pan ; Dept. of Comput. Sci., Univ. of North Carolina, Chapel Hill, NC ; Xiang Zhang ; Wei Wang

Simultaneously clustering columns and rows (co- clustering) of large data matrix is an important problem with wide applications, such as document mining, microarray analysis, and recommendation systems. Several co-clustering algorithms have been shown effective in discovering hidden clustering structures in the data matrix. For a data matrix of m rows and n columns, the time complexity of these methods is usually in the order of mtimesn (if not higher). This limits their applicability to data matrices involving a large number of columns and rows. Moreover, an implicit assumption made by existing co-clustering methods is that the whole data matrix needs to be held in the main memory. In this paper, we propose a general framework, CRD, for co-clustering large datasets utilizing recently developed sampling- based matrix decomposition methods. The time complexity of our approach is linear in m and n. And it does not require the whole data matrix be in the main memory. Experimental results show that CRD achieves competitive accuracy to existing co-clustering methods but with much less computational cost.

Published in:

Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on

Date of Conference:

7-12 April 2008