Complementary ensemble clustering of biomedical data

Samah Jamal Fodeh, Cynthia Brandt, Thai Binh Luong, Ali Haddad, Martin Schultz, Terrence Murphy, Michael Krauthammer

Research output: Contribution to journalArticlepeer-review

26 Scopus citations

Abstract

The rapidly growing availability of electronic biomedical data has increased the need for innovative data mining methods. Clustering in particular has been an active area of research in many different application areas, with existing clustering algorithms mostly focusing on one modality or representation of the data. Complementary ensemble clustering (CEC) is a recently introduced framework in which Kmeans is applied to a weighted, linear combination of the coassociation matrices obtained from separate ensemble clustering of different data modalities. The strength of CEC is its extraction of information from multiple aspects of the data when forming the final clusters. This study assesses the utility of CEC in biomedical data, which often have multiple data modalities, e.g., text and images, by applying CEC to two distinct biomedical datasets (PubMed images and radiology reports) that each have two modalities. Referent to five different clustering approaches based on the Kmeans algorithm, CEC exhibited equal or better performance in the metrics of micro-averaged precision and Normalized Mutual Information across both datasets. The reference methods included clustering of single modalities as well as ensemble clustering of separate and merged data modalities. Our experimental results suggest that CEC is equivalent or more efficient than comparable Kmeans based clustering methods using either single or merged data modalities.

Original languageEnglish (US)
Pages (from-to)436-443
Number of pages8
JournalJournal of Biomedical Informatics
Volume46
Issue number3
DOIs
StatePublished - Jun 2013

All Science Journal Classification (ASJC) codes

  • Health Informatics
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Complementary ensemble clustering of biomedical data'. Together they form a unique fingerprint.

Cite this