TY - JOUR
T1 - Tissue Classification Using Landmark and Non-Landmark Gene Sets for Feature Selection
AU - Clayman, Carly L.
AU - Mani, Alakesh
AU - Bondugula, Suraj
AU - Srinivasan, Satish M.
N1 - Publisher Copyright:
© 2021 Elsevier B.V.. All rights reserved.
PY - 2021
Y1 - 2021
N2 - The L1000 dataset, containing gene microarray data from 978 landmark genes has been previously shown to accurately predict expression of ~81% of the remaining 21,290 target genes. Microarray data was utilized to characterize groups of tissue types within the L1000 dataset to assess whether 978 landmark genes, compared to non-landmark genes, would better differentiate samples into clusters containing distinct tissue types. Landmark genes better differentiated k-means clusters, compared to non-landmark genes. These results suggest that landmark genes better characterize heterogeneous samples in their comprehensive genetic profile. Our previous studies showed that categorical separation of samples based on clinical or biological groups generally improves when studying heterogeneous sample types when using landmark genes as features, compared to non-landmark genes. However, the present work indicates that non-landmark genes may also be utilized to separate samples in clustering when there is a large sample size present for training k-means clustering models. In contrast, when studying a small sample size of the same set of heterogenous samples, landmark genes as features improve clustering. This study has implications for assessing various tissue types as landmark genes may be directly measured to predict categorical sample qualities as well as expression of remaining target genes.
AB - The L1000 dataset, containing gene microarray data from 978 landmark genes has been previously shown to accurately predict expression of ~81% of the remaining 21,290 target genes. Microarray data was utilized to characterize groups of tissue types within the L1000 dataset to assess whether 978 landmark genes, compared to non-landmark genes, would better differentiate samples into clusters containing distinct tissue types. Landmark genes better differentiated k-means clusters, compared to non-landmark genes. These results suggest that landmark genes better characterize heterogeneous samples in their comprehensive genetic profile. Our previous studies showed that categorical separation of samples based on clinical or biological groups generally improves when studying heterogeneous sample types when using landmark genes as features, compared to non-landmark genes. However, the present work indicates that non-landmark genes may also be utilized to separate samples in clustering when there is a large sample size present for training k-means clustering models. In contrast, when studying a small sample size of the same set of heterogenous samples, landmark genes as features improve clustering. This study has implications for assessing various tissue types as landmark genes may be directly measured to predict categorical sample qualities as well as expression of remaining target genes.
UR - http://www.scopus.com/inward/record.url?scp=85112680245&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85112680245&partnerID=8YFLogxK
U2 - 10.1016/j.procs.2021.05.027
DO - 10.1016/j.procs.2021.05.027
M3 - Conference article
AN - SCOPUS:85112680245
SN - 1877-0509
VL - 185
SP - 256
EP - 263
JO - Procedia Computer Science
JF - Procedia Computer Science
T2 - 2021 Complex Adaptive Systems Conference
Y2 - 16 June 2021 through 18 June 2021
ER -