Learning classifiers from remote RDF data stores augmented with RDFS subclass hierarchies

Harris T. Lin, Ngot Bui, Vasant Honavar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Rapid growth of RDF data in the Linked Open Data (LOD) cloud offers unprecedented opportunities for analyzing such data using machine learning algorithms. The massive size and distributed nature of LOD cloud present a challenging machine learning problem where the data can only be accessed remotely, i.e. through a query interface such as the SPARQL end-point of the data store. Existing approaches to learning classifiers from RDF data in such a setting fail to take advantage of RDF schema (RDFS) associated with the data store that asserts subclass hierarchies which provide information that can potentially be exploited by the learner. Against this background, we present a general approach that augments an existing directed graphical model with hidden variables that encode subclass hierarchies via probabilistic constraints. We also present an algorithm ProbAVT that adopts the variational Bayesian expectation maximization approach to efficiently learn parameters in such settings. Our experiments with several synthetic and real world datasets show that: (i) ProbAVT matches or outperforms its counterpart that does not incorporate background knowledge in the form of subclass hierarchies; (ii) ProbAVT remains competitive compared to other state-of-art models that incorporate subclass hierarchies, and is able to scale up to large hierarchies consisting of over tens of thousands of nodes.

Original languageEnglish (US)
Title of host publicationProceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015
EditorsFeng Luo, Kemafor Ogan, Mohammed J. Zaki, Laura Haas, Beng Chin Ooi, Vipin Kumar, Sudarsan Rachuri, Saumyadipta Pyne, Howard Ho, Xiaohua Hu, Shipeng Yu, Morris Hui-I Hsiao, Jian Li
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1807-1813
Number of pages7
ISBN (Electronic)9781479999255
DOIs
StatePublished - Dec 22 2015
Event3rd IEEE International Conference on Big Data, IEEE Big Data 2015 - Santa Clara, United States
Duration: Oct 29 2015Nov 1 2015

Publication series

NameProceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015

Other

Other3rd IEEE International Conference on Big Data, IEEE Big Data 2015
Country/TerritoryUnited States
CitySanta Clara
Period10/29/1511/1/15

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'Learning classifiers from remote RDF data stores augmented with RDFS subclass hierarchies'. Together they form a unique fingerprint.

Cite this