TY - GEN
T1 - Towards provenance-based anomaly detection in MapReduce
AU - Liao, Cong
AU - Squicciarini, Anna
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/7/7
Y1 - 2015/7/7
N2 - MapReduce enables parallel and distributed processing of vast amount of data on a cluster of machines. However, such computing paradigm is subject to threats posed by malicious and cheating nodes or compromised user submitted code that could tamper data and computation since users maintain little control as the computation is carried out in a distributed fashion. In this paper, we focus on the analysis and detection of anomalies during the process of MapReduce computation. Accordingly, we develop a computational provenance system that captures provenance data related to MapReduce computation within the MapReduce framework in Hadoop. In particular, we identify a set of invariants against aggregated provenance information, which are later analyzed to uncover anomalies indicating possible tampering of data and computation. We conduct a series of experiments to show the efficiency and effectiveness of our proposed provenance system.
AB - MapReduce enables parallel and distributed processing of vast amount of data on a cluster of machines. However, such computing paradigm is subject to threats posed by malicious and cheating nodes or compromised user submitted code that could tamper data and computation since users maintain little control as the computation is carried out in a distributed fashion. In this paper, we focus on the analysis and detection of anomalies during the process of MapReduce computation. Accordingly, we develop a computational provenance system that captures provenance data related to MapReduce computation within the MapReduce framework in Hadoop. In particular, we identify a set of invariants against aggregated provenance information, which are later analyzed to uncover anomalies indicating possible tampering of data and computation. We conduct a series of experiments to show the efficiency and effectiveness of our proposed provenance system.
UR - http://www.scopus.com/inward/record.url?scp=84941242786&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84941242786&partnerID=8YFLogxK
U2 - 10.1109/CCGrid.2015.16
DO - 10.1109/CCGrid.2015.16
M3 - Conference contribution
AN - SCOPUS:84941242786
T3 - Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015
SP - 647
EP - 656
BT - Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015
Y2 - 4 May 2015 through 7 May 2015
ER -