Abstract
Very often for the same scientific question, there may exist different techniques or experiments that measure the same numerical quantity. Histori-cally, various methods have been developed to exploit the information within each type of data independently. However, statistical data fusion methods that could effectively integrate multisource data under a unified framework are lacking. In this paper we propose a novel data fusion method, called B-scaling, for integrating multisource data. Consider K measurements that are generated from different sources but measure the same latent variable through some linear or nonlinear ways. We seek to find a representation of the latent variable, named B-mean, which captures the common information contained in the K measurements while taking into account the nonlinear mappings between them and the latent variable. We also establish the asymptotic prop-erty of the B-mean and apply the proposed method to integrate multiple hi-stone modifications and DNA methylation levels for characterizing epige-nomic landscape. Both numerical and empirical studies show that B-scaling is a powerful data fusion method with broad applications.
Original language | English (US) |
---|---|
Pages (from-to) | 1292-1312 |
Number of pages | 21 |
Journal | Annals of Applied Statistics |
Volume | 16 |
Issue number | 3 |
DOIs | |
State | Published - Sep 2022 |
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Modeling and Simulation
- Statistics, Probability and Uncertainty