Phylogenetic trees are commonly used to describe the evolutionary history of a group of species, but may also be used to study an evolving virus such as HIV. These trees are high-dimensional, non-real-valued data objects, with a specific pattern of built-in dependencies that violate the assumptions of many traditional statistical methodologies. We have found that these problems can often be overcome by defining, (i) an appropriate measure of correlation applicable to phylogenetic trees, (ii) an appropriate distance metric on trees, and (iii) an appropriate way to describe the probability distribution of phylogenetic trees. This paper describes these statistical tools and applies them to a variety of HIV-related examples of phylogenetic tree data. (C) 2000 Elsevier Science Ltd.
All Science Journal Classification (ASJC) codes
- Modeling and Simulation
- Computer Science Applications