Composition-on-composition regression analysis for multi-omics integration of metagenomic data

  • Nicholas Rios
  • , Yuke Shi
  • , Jun Chen
  • , Xiang Zhan
  • , Lingzhou Xue
  • , Qizhai Li

Research output: Contribution to journalArticlepeer-review

Abstract

Motivation Compositional data are frequently encountered in many disciplines, such as in next-generation sequencing experiments widely used in biomedical studies. Regression analysis with compositional data as either responses or predictors has been well studied. However, when both responses and predictors are compositional, the inventory of analysis tools is surprisingly limited, especially in the high-dimensional setting. Among the few existing methods, most of them rely on a log-ratio transformation to move compositional data from the simplex to real numbers. Yet, a serious weakness of these methods is their failure to handle the substantial fraction of zeroes observed in data collected from next-generation sequencing experiments. Results To investigate associations between two high-dimensional multi-omics compositions, we propose a composition-on-composition (COC) regression analysis method which does not require log-ratio transformations and hence can handle zeroes in the data. To account for high dimensionality, we estimate regression coefficients using a penalized estimation equation approach. Finally, inference procedures for COC regression are also proposed. Superior performance of COC is demonstrated through both comprehensive numerical simulations and case studies. Availability and implementation Source R codes to implement COC method is available at https://github.com/nrios4/COC.

Original languageEnglish (US)
Article numberbtaf387
JournalBioinformatics
Volume41
Issue number7
DOIs
StatePublished - Jul 1 2025

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'Composition-on-composition regression analysis for multi-omics integration of metagenomic data'. Together they form a unique fingerprint.

Cite this