The human microbiome is a dynamic system that changes due to diseases, medication, change in diet, etc. The paired design is a common approach to evaluate the microbial changes while controlling for the inherent differences between people. For example, microbiome data may be collected from the same individuals before and after a treatment. Two challenges exist in analyzing this type of data. First, microbiome data are compositional such that the reads for all taxa in each sample are constrained to sum to a constant. Second, the number of taxa can be much larger than the sample size. Few statistical methods exist to analyze such data besides methods that test one taxon at a time. In this paper, we propose to first conduct a log-ratio transformation of the compositions, and then develop a generalized Hotelling's test (GHT) to evaluate whether the average microbiome compositions are equivalent in the paired samples. We replace the sample covariance matrix in standard Hotelling's statistic by a shrinkage-based covariance, calculated as a weighted average of the sample covariance and a positive definite target matrix. The optimal weighting can be obtained for many commonly used target matrices. We develop a permutation procedure to assess the statistical significance. Extensive simulations show that our proposed method has well-controlled type I error and better power than a few ad hoc approaches. We apply our method to examine the vaginal microbiome changes in response to treatments for menopausal hot flashes. An R package “ GHT” is freely available at https://github.com/zhaoni153/GHT.
All Science Journal Classification (ASJC) codes