Project Details
Description
Development of high throughput data acquisition technologies together with advances in computing, and communications have resulted in an explosive growth in the number, size, and diversity of potentially useful information sources. However, the massive size, heterogeneity, autonomy, and distributed nature of the data repositories present significant hurdles in extracting knowledge from this data. This research seeks to overcome these hurdles through the design, analysis, and implementation of:
a) Efficient distributed and cumulative learning algorithms with provable performance guarantees (relative to their centralized or batch counterparts) for knowledge acquisition from distributed data sources;
b) Customizable information extraction agents that can effectively exploit domain or context-specific ontologies supplied by the users to extract the information needed for learning (e.g., statistics) from distributed data sources despite differences in query capabilities, interfaces, ontologies, and access restrictions to facilitate analysis of heterogeneous distributed data from different perspectives.
c) INDUS - a test-bed for knowledge acquisition from heterogeneous distributed data in computational molecular biology (e.g., characterization of protein sequence-structure-function relationships using diverse sources of biological data).
The resulting algorithms and software can accelerate, potentially by an order of magnitude, the rate of scientific
Status | Finished |
---|---|
Effective start/end date | 8/15/02 → 12/31/06 |
Funding
- National Science Foundation