TY - GEN
T1 - Pairwise constrained clustering for sparse and high dimensional feature spaces
AU - Yan, Su
AU - Wang, Hai
AU - Lee, Dongwon
AU - Giles, C. Lee
PY - 2009
Y1 - 2009
N2 - Clustering high dimensional data with sparse features is challenging because pairwise distances between data items are not informative in high dimensional space. To address this challenge, we propose two novel semi-supervised clustering methods that incorporate prior knowledge in the form of pairwise cluster membership constraints. In particular,we project high-dimensional data onto a much reduced-dimension subspace, where rough clustering structure defined by the prior knowledge is strengthened. Metric learning is then performed on the subspace to construct more informative pairwise distances. We also propose to propagate constraints locally to improve the informativeness of pairwise distances. When the new methods are evaluated using two real benchmark data sets, they show substantial improvement using only limited prior knowledge.
AB - Clustering high dimensional data with sparse features is challenging because pairwise distances between data items are not informative in high dimensional space. To address this challenge, we propose two novel semi-supervised clustering methods that incorporate prior knowledge in the form of pairwise cluster membership constraints. In particular,we project high-dimensional data onto a much reduced-dimension subspace, where rough clustering structure defined by the prior knowledge is strengthened. Metric learning is then performed on the subspace to construct more informative pairwise distances. We also propose to propagate constraints locally to improve the informativeness of pairwise distances. When the new methods are evaluated using two real benchmark data sets, they show substantial improvement using only limited prior knowledge.
UR - http://www.scopus.com/inward/record.url?scp=67650705045&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67650705045&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-01307-2_61
DO - 10.1007/978-3-642-01307-2_61
M3 - Conference contribution
AN - SCOPUS:67650705045
SN - 3642013066
SN - 9783642013065
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 620
EP - 627
BT - 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009
T2 - 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009
Y2 - 27 April 2009 through 30 April 2009
ER -