4 Scopus citations

Abstract

The emergence of many interlinked, physically distributed, and autonomously maintained RDF stores offers unprecedented opportunities for predictive modeling and knowledge discovery from such data. However existing machine learning approaches are limited in their applicability because it is neither desirable nor feasible to gather all of the data in a centralized location for analysis due to access, memory, bandwidth, computational restrictions, and sometimes privacy and confidentiality constraints. Against this background, we consider the problem of learning predictive models from multiple interlinked RDF stores. Specifically we: (i) introduce statistical query based formulations of several representative algorithms for learning classifiers from RDF data, (ii) introduce a distributed learning framework to learn classifiers from multiple interlinked RDF stores that form a chain, (iii) identify three special cases of RDF data fragmentation and describe effective strategies for learning predictive models in each case, (iv) consider a novel application of a matrix reconstruction technique from the field of Computerized Tomography [1] to approximate the statistics needed by the learning algorithm from projections using count queries, thus dramatically reducing the amount of information transmitted from the remote data sources to the learner, and (v) report results of experiments with a real-world social network data set (Last.fm), which demonstrate the feasibility of the proposed approach.

Original languageEnglish (US)
Title of host publicationProceedings - 2013 IEEE International Congress on Big Data, BigData 2013
Pages94-101
Number of pages8
DOIs
StatePublished - 2013
Event2013 IEEE International Congress on Big Data, BigData 2013 - Santa Clara, CA, United States
Duration: Jun 27 2013Jul 2 2013

Publication series

NameProceedings - 2013 IEEE International Congress on Big Data, BigData 2013

Other

Other2013 IEEE International Congress on Big Data, BigData 2013
Country/TerritoryUnited States
CitySanta Clara, CA
Period6/27/137/2/13

All Science Journal Classification (ASJC) codes

  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Learning classifiers from chains of multiple interlinked RDF data stores'. Together they form a unique fingerprint.

Cite this