TY - JOUR
T1 - EMDUniFrac
T2 - exact linear time computation of the UniFrac metric and identification of differentially abundant organisms
AU - McClelland, Jason
AU - Koslicki, David
N1 - Publisher Copyright:
© 2018, Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2018/10/1
Y1 - 2018/10/1
N2 - Both the weighted and unweighted UniFrac distances have been very successfully employed to assess if two communities differ, but do not give any information about how two communities differ. We take advantage of recent observations that the UniFrac metric is equivalent to the so-called earth mover’s distance (also known as the Kantorovich–Rubinstein metric) to develop an algorithm that not only computes the UniFrac distance in linear time and space, but also simultaneously finds which operational taxonomic units are responsible for the observed differences between samples. This allows the algorithm, called EMDUniFrac, to determine why given samples are different, not just if they are different, and with no added computational burden. EMDUniFrac can be utilized on any distribution on a tree, and so is particularly suitable to analyzing both operational taxonomic units derived from amplicon sequencing, as well as community profiles resulting from classifying whole genome shotgun metagenomes. The EMDUniFrac source code (written in python) is freely available at: https://github.com/dkoslicki/EMDUniFrac.
AB - Both the weighted and unweighted UniFrac distances have been very successfully employed to assess if two communities differ, but do not give any information about how two communities differ. We take advantage of recent observations that the UniFrac metric is equivalent to the so-called earth mover’s distance (also known as the Kantorovich–Rubinstein metric) to develop an algorithm that not only computes the UniFrac distance in linear time and space, but also simultaneously finds which operational taxonomic units are responsible for the observed differences between samples. This allows the algorithm, called EMDUniFrac, to determine why given samples are different, not just if they are different, and with no added computational burden. EMDUniFrac can be utilized on any distribution on a tree, and so is particularly suitable to analyzing both operational taxonomic units derived from amplicon sequencing, as well as community profiles resulting from classifying whole genome shotgun metagenomes. The EMDUniFrac source code (written in python) is freely available at: https://github.com/dkoslicki/EMDUniFrac.
UR - http://www.scopus.com/inward/record.url?scp=85045879316&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85045879316&partnerID=8YFLogxK
U2 - 10.1007/s00285-018-1235-9
DO - 10.1007/s00285-018-1235-9
M3 - Article
C2 - 29691633
AN - SCOPUS:85045879316
SN - 0303-6812
VL - 77
SP - 935
EP - 949
JO - Journal of Mathematical Biology
JF - Journal of Mathematical Biology
IS - 4
ER -