TY - JOUR
T1 - Complementary ensemble clustering of biomedical data
AU - Fodeh, Samah Jamal
AU - Brandt, Cynthia
AU - Luong, Thai Binh
AU - Haddad, Ali
AU - Schultz, Martin
AU - Murphy, Terrence
AU - Krauthammer, Michael
N1 - Funding Information:
This study was funded by National Institute of Health NIH/ Natural Library of Medicine NLM 5R01LM009956 (MK, SF), VA grant HIR 08-374 HSR&D: Consortium for Healthcare Informatics (CB, SF, MK) and assisted by the Yale Claude D. Pepper Older Americans Independence Center P30AG21342 ™ and by a grant from the National Institute on Aging ™ (1R21AG033130-01A2).
PY - 2013/6
Y1 - 2013/6
N2 - The rapidly growing availability of electronic biomedical data has increased the need for innovative data mining methods. Clustering in particular has been an active area of research in many different application areas, with existing clustering algorithms mostly focusing on one modality or representation of the data. Complementary ensemble clustering (CEC) is a recently introduced framework in which Kmeans is applied to a weighted, linear combination of the coassociation matrices obtained from separate ensemble clustering of different data modalities. The strength of CEC is its extraction of information from multiple aspects of the data when forming the final clusters. This study assesses the utility of CEC in biomedical data, which often have multiple data modalities, e.g., text and images, by applying CEC to two distinct biomedical datasets (PubMed images and radiology reports) that each have two modalities. Referent to five different clustering approaches based on the Kmeans algorithm, CEC exhibited equal or better performance in the metrics of micro-averaged precision and Normalized Mutual Information across both datasets. The reference methods included clustering of single modalities as well as ensemble clustering of separate and merged data modalities. Our experimental results suggest that CEC is equivalent or more efficient than comparable Kmeans based clustering methods using either single or merged data modalities.
AB - The rapidly growing availability of electronic biomedical data has increased the need for innovative data mining methods. Clustering in particular has been an active area of research in many different application areas, with existing clustering algorithms mostly focusing on one modality or representation of the data. Complementary ensemble clustering (CEC) is a recently introduced framework in which Kmeans is applied to a weighted, linear combination of the coassociation matrices obtained from separate ensemble clustering of different data modalities. The strength of CEC is its extraction of information from multiple aspects of the data when forming the final clusters. This study assesses the utility of CEC in biomedical data, which often have multiple data modalities, e.g., text and images, by applying CEC to two distinct biomedical datasets (PubMed images and radiology reports) that each have two modalities. Referent to five different clustering approaches based on the Kmeans algorithm, CEC exhibited equal or better performance in the metrics of micro-averaged precision and Normalized Mutual Information across both datasets. The reference methods included clustering of single modalities as well as ensemble clustering of separate and merged data modalities. Our experimental results suggest that CEC is equivalent or more efficient than comparable Kmeans based clustering methods using either single or merged data modalities.
UR - http://www.scopus.com/inward/record.url?scp=84878179672&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84878179672&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2013.02.001
DO - 10.1016/j.jbi.2013.02.001
M3 - Article
C2 - 23454721
AN - SCOPUS:84878179672
SN - 1532-0464
VL - 46
SP - 436
EP - 443
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
IS - 3
ER -