Segmented correspondence curve regression for quantifying covariate effects on the reproducibility of high-throughput experiments

Feipeng Zhang, Qunhua Li

Research output: Contribution to journalArticlepeer-review

Abstract

High-throughput biological experiments are essential tools for identifying biologically interesting candidates in large-scale omics studies. The results of a high-throughput biological experiment rely heavily on the operational factors chosen in its experimental and data-analytic procedures. Understanding how these operational factors influence the reproducibility of the experimental outcome is critical for selecting the optimal parameter settings and designing reliable high-throughput workflows. However, the influence of an operational factor may differ between strong and weak candidates in a high-throughput experiment, complicating the selection of parameter settings. To address this issue, we propose a novel segmented regression model, called segmented correspondence curve regression, to assess the influence of operational factors on the reproducibility of high-throughput experiments. Our model dissects the heterogeneous effects of operational factors on strong and weak candidates, providing a principled way to select operational parameters. Based on this framework, we also develop a sup-likelihood ratio test for the existence of heterogeneity. Simulation studies show that our estimation and testing procedures yield well-calibrated type I errors and are substantially more powerful in detecting and locating the differences in reproducibility across workflows than the existing method. Using this model, we investigated an important design question for ChIP-seq experiments: How many reads should one sequence to obtain reliable results in a cost-effective way? Our results reveal new insights into the impact of sequencing depth on the binding-site identification reproducibility, helping biologists determine the most cost-effective sequencing depth to achieve sufficient reproducibility for their study goals.

Original languageEnglish (US)
Pages (from-to)2272-2285
Number of pages14
JournalBiometrics
Volume79
Issue number3
DOIs
StatePublished - Sep 2023

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Agricultural and Biological Sciences(all)
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Segmented correspondence curve regression for quantifying covariate effects on the reproducibility of high-throughput experiments'. Together they form a unique fingerprint.

Cite this