A general framework for fast co-clustering on large datasets using matrix decomposition

Feng Pan, Xiang Zhang, Wei Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Scopus citations

Abstract

Simultaneously clustering columns and rows (coclustering) of large data matrix is an important problem with wide applications, such as document mining, microarray analysis, and recommendation systems. Several co-clustering algorithms have been shown effective in discovering hidden clustering structures in the data matrix. For a data matrix of m rows and n columns, the time complexity of these methods is usually in the order of m × n (if not higher). This limits their applicability to data matrices involving a large number of columns and rows. Moreover, an implicit assumption made by existing co-clustering methods is that the whole data matrix needs to be held in the main memory. In this paper, we propose a general framework, CRD, for co-clustering large datasets utilizing recently developed sampling-based matrix decomposition methods. The time complexity of our approach is linear in m and n. And it does not require the whole data matrix be in the main memory. Experimental results show that CRD achieves competitive accuracy to existing co-clustering methods but with much less computational cost.

Original languageEnglish (US)
Title of host publicationProceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08
Pages1337-1339
Number of pages3
DOIs
StatePublished - 2008
Event2008 IEEE 24th International Conference on Data Engineering, ICDE'08 - Cancun, Mexico
Duration: Apr 7 2008Apr 12 2008

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627

Other

Other2008 IEEE 24th International Conference on Data Engineering, ICDE'08
Country/TerritoryMexico
CityCancun
Period4/7/084/12/08

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Information Systems

Fingerprint

Dive into the research topics of 'A general framework for fast co-clustering on large datasets using matrix decomposition'. Together they form a unique fingerprint.

Cite this