Project Details
Description
This project will develop a novel paradigm of generative modeling for decentralized data. In the big data era, the enormous volume, and high variety and velocity of data raise new technical challenges. What can be substantially strengthened is in the area of statistical learning under the limitations imposed by distributed data collections, communication networks, and decentralized computing platforms. As an example, the size of the data can be so large in many engineering applications that a single computer cannot handle. Typical learning methods, however, expect training data to be static and can be handled by one computer. The generative modeling framework has been shown to be effective in incorporating prior knowledge and capturing statistical dependence among data residing on structured domains, e.g., time sequences for signals and spatial grids for images. These advantages suit well with data arising from natural phenomena and the needs of engineering systems. The project addresses constraints in storage and communication capacity, as well as the speed requirement of real-time analysis by advancing multi-scale statistical modeling consisting of a layer of data-level learning and a layer of model-level learning. Two doctoral students will be supported to conduct research at the interface of engineering and statistics. They will develop core methodologies, as well as practical algorithms and tools useful in a wide range of engineering disciplines.
The goal of this project is to propose new approaches in statistical learning for distributed and dynamic data subject to constraints of communication networks and the decentralized architecture of computing platforms. In particular, multi-scale statistical modeling for learning from distributed and dynamic data will be advanced. At the data-level, modeling is performed at decentralized computing sites. These models serve as a highly compact description of the data, retaining key information for learning. To consolidate the models acquired at distributed sites, only the models are communicated to a primary computer node. At the primary node, learning is performed directly on the models without regenerating data. An integrated investigation will be conducted on trade-offs between data and various computing resources such as CPU and storage. This project is transformative because of the fundamental nature of the problems, the unusual formulation of problems, and the interdisciplinary approaches. The usual paradigm of learning directly from data is transformed to multi-scale learning where statistical models become learning objects themselves. A suite of tools integrating methodologies in statistics and engineering will be developed and made available.
Status | Finished |
---|---|
Effective start/end date | 9/1/15 → 8/31/18 |
Funding
- National Science Foundation: $250,000.00
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.