K-SVMeans: A hybrid clustering algorithm for multi-type interrelated datasets

Levent Bolelli, Seyda Ertekin, Ding Zhou, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

Identification of distinct clusters of documents in text collections has traditionally been addressed by making the assumption that the data instances can only be represented by homogeneous and uniform features. Many real-world data, on the other hand, comprise of multiple types of heterogeneous interrelated components, such as web pages and hyperlinks, online scientific publications and authors and publication venues to name a few. In this paper, we present K-SVMeans, a clustering algorithm for multi-type interrelated datasets that integrates the well known K-Means clustering with the highly popular Support Vector Machines. The experimental results on authorship analysis of two real world web-based datasets show that K-SVMeans can successfully discover topical clusters of documents and achieve better clustering solutions than homogeneous data clustering.

Original languageEnglish (US)
Title of host publicationProceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007
Pages198-204
Number of pages7
DOIs
StatePublished - 2007
EventIEEE/WIC/ACM International Conference on Web Intelligence, WI 2007 - Silicon Valley, CA, United States
Duration: Nov 2 2007Nov 5 2007

Publication series

NameProceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007

Other

OtherIEEE/WIC/ACM International Conference on Web Intelligence, WI 2007
Country/TerritoryUnited States
CitySilicon Valley, CA
Period11/2/0711/5/07

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications

Cite this