TY - GEN
T1 - Clustering analysis of brain protein expression levels in trisomic and control mice
AU - Clayman, Carly L.
AU - Clayman, Scott N.
AU - Mukherjee, Partha
N1 - Funding Information:
We acknowledge Pennsylvania State University - Great Valley - School of Professional Studies - Data Analytics program for training in data mining methods.
Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/4/6
Y1 - 2019/4/6
N2 - In this paper, we describe a clustering analysis on 77 distinct brain protein expression levels of trisomic and control mice. Hierarchical clustering based on Euclidean distance results in clusters that partially coincide with experimental treatment groups of mice, as shown in dendrogram results. Normalization results in decreased within- and between-cluster sum of squares and a decreased ratio of between- to within-cluster sum of squares. The optimal number of clusters ranges from 1 to 4 clusters as determined by the gap statistic method or direct methods of the silhouette width or the elbow of total within-cluster sum of squares. Principal components analysis shows separation of clustered groups generated by k-means clustering. When clustered groups are plotted against the first two principal components, more distinct clusters are generated after z-score normalization of protein expression levels, compared to non-normalized results.
AB - In this paper, we describe a clustering analysis on 77 distinct brain protein expression levels of trisomic and control mice. Hierarchical clustering based on Euclidean distance results in clusters that partially coincide with experimental treatment groups of mice, as shown in dendrogram results. Normalization results in decreased within- and between-cluster sum of squares and a decreased ratio of between- to within-cluster sum of squares. The optimal number of clusters ranges from 1 to 4 clusters as determined by the gap statistic method or direct methods of the silhouette width or the elbow of total within-cluster sum of squares. Principal components analysis shows separation of clustered groups generated by k-means clustering. When clustered groups are plotted against the first two principal components, more distinct clusters are generated after z-score normalization of protein expression levels, compared to non-normalized results.
UR - http://www.scopus.com/inward/record.url?scp=85068680101&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85068680101&partnerID=8YFLogxK
U2 - 10.1145/3325917.3325932
DO - 10.1145/3325917.3325932
M3 - Conference contribution
AN - SCOPUS:85068680101
T3 - ACM International Conference Proceeding Series
SP - 114
EP - 118
BT - Proceedings of 3rd International Conference on Information System and Data Mining, ICISDM 2019
PB - Association for Computing Machinery
T2 - 3rd International Conference on Information System and Data Mining, ICISDM 2019
Y2 - 6 April 2019 through 8 April 2019
ER -