TY - GEN
T1 - PENS
T2 - 1st International Conference on Scalable Information Systems, InfoScale '06
AU - Li, Mei
AU - Lee, Guanling
AU - Lee, Wang-chien
AU - Sivasubramaniam, Anand
PY - 2006
Y1 - 2006
N2 - Huge amounts of data are available in large-scale networks of autonomous data sources dispersed over a vide area. Data mining is an essential technology for obtaining hidden and valuable knowledge from these networked data sources. In this paper, we investigate clustering, one of the most important data mining tasks, in one of such networked computing environments, i.e., peer-to-peer (P2P) systems. The lack of a central control and the sheer large size of P2P systems make the existing clustering techniques not applicable here. We propose a fully distributed clustering algorithm, called Peer dENsity-based cluStering (PENS), which overcomes the challenge raised in performing clustering in peer-to-peer environments, i.e., cluster assembly. The main idea of PENS is hierarchical cluster assembly, which enables peers to collaborate in forming a global clustering model without requiring a central control or message flooding. The complexity analysis of the algorithm demonstrates that PENS can discover clusters and noise efficiently in P2P systems.
AB - Huge amounts of data are available in large-scale networks of autonomous data sources dispersed over a vide area. Data mining is an essential technology for obtaining hidden and valuable knowledge from these networked data sources. In this paper, we investigate clustering, one of the most important data mining tasks, in one of such networked computing environments, i.e., peer-to-peer (P2P) systems. The lack of a central control and the sheer large size of P2P systems make the existing clustering techniques not applicable here. We propose a fully distributed clustering algorithm, called Peer dENsity-based cluStering (PENS), which overcomes the challenge raised in performing clustering in peer-to-peer environments, i.e., cluster assembly. The main idea of PENS is hierarchical cluster assembly, which enables peers to collaborate in forming a global clustering model without requiring a central control or message flooding. The complexity analysis of the algorithm demonstrates that PENS can discover clusters and noise efficiently in P2P systems.
UR - http://www.scopus.com/inward/record.url?scp=34547380798&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34547380798&partnerID=8YFLogxK
U2 - 10.1145/1146847.1146886
DO - 10.1145/1146847.1146886
M3 - Conference contribution
AN - SCOPUS:34547380798
SN - 1595934286
SN - 9781595934284
T3 - ACM International Conference Proceeding Series
BT - Proceedings of the 1st International Conference on Scalable Information Systems, InfoScale '06
Y2 - 30 May 2006 through 1 June 2006
ER -