Direct prediction of regulatory elements from partial data without imputation

Yu Zhang, Shaun Mahony

Research output: Contribution to journalArticlepeer-review

6 Scopus citations


Genome segmentation approaches allow us to characterize regulatory states in a given cell type using combinatorial patterns of histone modifications and other regulatory signals. In order to analyze regulatory state differences across cell types, current genome segmentation approaches typically require that the same regulatory genomics assays have been performed in all analyzed cell types. This necessarily limits both the numbers of cell types that can be analyzed and the complexity of the resulting regulatory states, as only a small number of histone modifications have been profiled across many cell types. Data imputation approaches that aim to estimate missing regulatory signals have been applied before genome segmentation. However, this approach is computationally costly and propagates any errors in imputation to produce incorrect genome segmentation results downstream. We present an extension to the IDEAS genome segmentation platform which can perform genome segmentation on incomplete regulatory genomics dataset collections without using imputation. Instead of relying on imputed data, we use an expectation-maximization approach to estimate marginal density functions within each regulatory state. We demonstrate that our genome segmentation results compare favorably with approaches based on imputation or other strategies for handling missing data. We further show that our approach can accurately impute missing data after genome segmentation, reversing the typical order of imputation/genome segmentation pipelines. Finally, we present a new 2D genome segmentation analysis of 127 human cell types studied by the Roadmap Epigenomics Consortium. By using an expanded set of chromatin marks that have been profiled in subsets of these cell types, our new segmentation results capture a more complex picture of combinatorial regulatory patterns that appear on the human genome.

Original languageEnglish (US)
Article numbere1007399
JournalPLoS computational biology
Issue number11
StatePublished - 2019

All Science Journal Classification (ASJC) codes

  • Ecology, Evolution, Behavior and Systematics
  • Modeling and Simulation
  • Ecology
  • Molecular Biology
  • Genetics
  • Cellular and Molecular Neuroscience
  • Computational Theory and Mathematics


Dive into the research topics of 'Direct prediction of regulatory elements from partial data without imputation'. Together they form a unique fingerprint.

Cite this