Graph stream summarization: From big bang to big crunch

Nan Tang, Qing Chen, Prasenjit Mitra

Research output: Chapter in Book/Report/Conference proceedingConference contribution

77 Scopus citations

Abstract

A graph stream, which refers to the graph with edges being updated sequentially in a form of a stream, has important applications in cyber security and social networks. Due to the sheer volume and highly dynamic nature of graph streams, the practical way of handling them is by summarization. Given a graph stream G, directed or undirected, the problem of graph stream summarization is to summarize G as SG with a much smaller (sublinear) space, linear construction time and constant maintenance cost for each edge update, such that SG allows many queries over G to be approximately conducted efficiently. The widely used practice of summarizing data streams is to treat each stream element independently by e.g., hash-or sample-based methods, without maintaining the connections (or relationships) between elements. Hence, existing methods can only solve ad-hoc problems, without supporting diversified and complicated analytics over graph streams. We present TCM, a novel graph stream summary. Given an incoming edge, it summarizes both node and edge information in constant time. Consequently, the summary forms a graphical sketch where edges capture the connections inside elements, and nodes maintain relationships across elements. We discuss a wide range of supported queries and establish some error bounds. In addition, we experimentally show that TCM can effectively and efficiently support analytics over graph streams beyond the power of existing sketches, which demonstrates its potential to start a new line of research and applications in graph stream management.

Original languageEnglish (US)
Title of host publicationSIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages1481-1496
Number of pages16
ISBN (Electronic)9781450335317
DOIs
StatePublished - Jun 26 2016
Event2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016 - San Francisco, United States
Duration: Jun 26 2016Jul 1 2016

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
Volume26-June-2016
ISSN (Print)0730-8078

Other

Other2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016
Country/TerritoryUnited States
CitySan Francisco
Period6/26/167/1/16

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Graph stream summarization: From big bang to big crunch'. Together they form a unique fingerprint.

Cite this