TY - JOUR
T1 - Probabilistic K-means with Local Alignment for Clustering and Motif Discovery in Functional Data
AU - Cremona, Marzia A.
AU - Chiaromonte, Francesca
N1 - Publisher Copyright:
© 2023 American Statistical Association and Institute of Mathematical Statistics.
PY - 2023
Y1 - 2023
N2 - We develop a new method to locally cluster curves and discover functional motifs, that is, typical shapes that may recur several times along and across the curves capturing important local characteristics. In order to identify these shared curve portions, our method leverages ideas from functional data analysis (joint clustering and alignment of curves), bioinformatics (local alignment through the extension of high similarity seeds) and fuzzy clustering (curves belonging to more than one cluster, if they contain more than one typical shape). It can employ various dissimilarity measures and incorporate derivatives in the discovery process, thus exploiting complex facets of shapes. We demonstrate the performance of our method with an extensive simulation study, and show how it generalizes other clustering methods for functional data. Finally, we provide real data applications to Italian Covid-19 death curves and Omics data related to mutagenesis. Supplementary materials for this article are available online.
AB - We develop a new method to locally cluster curves and discover functional motifs, that is, typical shapes that may recur several times along and across the curves capturing important local characteristics. In order to identify these shared curve portions, our method leverages ideas from functional data analysis (joint clustering and alignment of curves), bioinformatics (local alignment through the extension of high similarity seeds) and fuzzy clustering (curves belonging to more than one cluster, if they contain more than one typical shape). It can employ various dissimilarity measures and incorporate derivatives in the discovery process, thus exploiting complex facets of shapes. We demonstrate the performance of our method with an extensive simulation study, and show how it generalizes other clustering methods for functional data. Finally, we provide real data applications to Italian Covid-19 death curves and Omics data related to mutagenesis. Supplementary materials for this article are available online.
UR - http://www.scopus.com/inward/record.url?scp=85147782117&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85147782117&partnerID=8YFLogxK
U2 - 10.1080/10618600.2022.2156522
DO - 10.1080/10618600.2022.2156522
M3 - Article
AN - SCOPUS:85147782117
SN - 1061-8600
VL - 32
SP - 1119
EP - 1130
JO - Journal of Computational and Graphical Statistics
JF - Journal of Computational and Graphical Statistics
IS - 3
ER -