Towards provenance-based anomaly detection in MapReduce

Research output: Chapter in Book/Report/Conference proceedingConference contribution

21 Scopus citations

Abstract

MapReduce enables parallel and distributed processing of vast amount of data on a cluster of machines. However, such computing paradigm is subject to threats posed by malicious and cheating nodes or compromised user submitted code that could tamper data and computation since users maintain little control as the computation is carried out in a distributed fashion. In this paper, we focus on the analysis and detection of anomalies during the process of MapReduce computation. Accordingly, we develop a computational provenance system that captures provenance data related to MapReduce computation within the MapReduce framework in Hadoop. In particular, we identify a set of invariants against aggregated provenance information, which are later analyzed to uncover anomalies indicating possible tampering of data and computation. We conduct a series of experiments to show the efficiency and effectiveness of our proposed provenance system.

Original languageEnglish (US)
Title of host publicationProceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages647-656
Number of pages10
ISBN (Electronic)9781479980062
DOIs
StatePublished - Jul 7 2015
Event15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015 - Shenzhen, China
Duration: May 4 2015May 7 2015

Publication series

NameProceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015

Other

Other15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015
Country/TerritoryChina
CityShenzhen
Period5/4/155/7/15

All Science Journal Classification (ASJC) codes

  • Computer Science (miscellaneous)
  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'Towards provenance-based anomaly detection in MapReduce'. Together they form a unique fingerprint.

Cite this