TY - GEN
T1 - Graph stream summarization
T2 - 2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016
AU - Tang, Nan
AU - Chen, Qing
AU - Mitra, Prasenjit
PY - 2016/6/26
Y1 - 2016/6/26
N2 - A graph stream, which refers to the graph with edges being updated sequentially in a form of a stream, has important applications in cyber security and social networks. Due to the sheer volume and highly dynamic nature of graph streams, the practical way of handling them is by summarization. Given a graph stream G, directed or undirected, the problem of graph stream summarization is to summarize G as SG with a much smaller (sublinear) space, linear construction time and constant maintenance cost for each edge update, such that SG allows many queries over G to be approximately conducted efficiently. The widely used practice of summarizing data streams is to treat each stream element independently by e.g., hash-or sample-based methods, without maintaining the connections (or relationships) between elements. Hence, existing methods can only solve ad-hoc problems, without supporting diversified and complicated analytics over graph streams. We present TCM, a novel graph stream summary. Given an incoming edge, it summarizes both node and edge information in constant time. Consequently, the summary forms a graphical sketch where edges capture the connections inside elements, and nodes maintain relationships across elements. We discuss a wide range of supported queries and establish some error bounds. In addition, we experimentally show that TCM can effectively and efficiently support analytics over graph streams beyond the power of existing sketches, which demonstrates its potential to start a new line of research and applications in graph stream management.
AB - A graph stream, which refers to the graph with edges being updated sequentially in a form of a stream, has important applications in cyber security and social networks. Due to the sheer volume and highly dynamic nature of graph streams, the practical way of handling them is by summarization. Given a graph stream G, directed or undirected, the problem of graph stream summarization is to summarize G as SG with a much smaller (sublinear) space, linear construction time and constant maintenance cost for each edge update, such that SG allows many queries over G to be approximately conducted efficiently. The widely used practice of summarizing data streams is to treat each stream element independently by e.g., hash-or sample-based methods, without maintaining the connections (or relationships) between elements. Hence, existing methods can only solve ad-hoc problems, without supporting diversified and complicated analytics over graph streams. We present TCM, a novel graph stream summary. Given an incoming edge, it summarizes both node and edge information in constant time. Consequently, the summary forms a graphical sketch where edges capture the connections inside elements, and nodes maintain relationships across elements. We discuss a wide range of supported queries and establish some error bounds. In addition, we experimentally show that TCM can effectively and efficiently support analytics over graph streams beyond the power of existing sketches, which demonstrates its potential to start a new line of research and applications in graph stream management.
UR - http://www.scopus.com/inward/record.url?scp=84979656018&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84979656018&partnerID=8YFLogxK
U2 - 10.1145/2882903.2915223
DO - 10.1145/2882903.2915223
M3 - Conference contribution
AN - SCOPUS:84979656018
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 1481
EP - 1496
BT - SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data
PB - Association for Computing Machinery
Y2 - 26 June 2016 through 1 July 2016
ER -