ITR: Algorithms and Software for Knowledge Acquisition from Heterogeneous Distributed Data

Project: Research project

Project Details

Description

Development of high throughput data acquisition technologies together with advances in computing, and communications have resulted in an explosive growth in the number, size, and diversity of potentially useful information sources. However, the massive size, heterogeneity, autonomy, and distributed nature of the data repositories present significant hurdles in extracting knowledge from this data. This research seeks to overcome these hurdles through the design, analysis, and implementation of:

a) Efficient distributed and cumulative learning algorithms with provable performance guarantees (relative to their centralized or batch counterparts) for knowledge acquisition from distributed data sources;

b) Customizable information extraction agents that can effectively exploit domain or context-specific ontologies supplied by the users to extract the information needed for learning (e.g., statistics) from distributed data sources despite differences in query capabilities, interfaces, ontologies, and access restrictions to facilitate analysis of heterogeneous distributed data from different perspectives.

c) INDUS - a test-bed for knowledge acquisition from heterogeneous distributed data in computational molecular biology (e.g., characterization of protein sequence-structure-function relationships using diverse sources of biological data).

The resulting algorithms and software can accelerate, potentially by an order of magnitude, the rate of scientific

StatusFinished
Effective start/end date8/15/0212/31/06

Funding

  • National Science Foundation

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.