Quantitative comparison of epigenomic data across multiple cell types or experimental conditions is a promising way to understand the biological functions of epigenetic modifications. However, differences in sequencing depth and signal-to-noise ratios in the data from different experiments can hinder our ability to identify real biological variation from raw epigenomic data. Proper normalization is required prior to data analysis to gain meaningful insights. Most existing methods for data normalization standardize signals by rescaling either background regions or peak regions, assuming that the same scale factor is applicable to both background and peak regions. While such methods adjust for differences in sequencing depths, they do not address differences in the signal-to-noise ratios across different experiments. We developed a new data normalization method, called S3norm, that normalizes the sequencing depths and signal-to-noise ratios across different data sets simultaneously by a monotonic nonlinear transformation. We show empirically that the epigenomic data normalized by our method, compared to existing methods, can better capture real biological variation, such as impact on gene expression regulation.
All Science Journal Classification (ASJC) codes