TY - GEN
T1 - Complex network analysis using parallel approximate motif counting
AU - Slota, George M.
AU - Madduri, Kamesh
N1 - Copyright:
Copyright 2014 Elsevier B.V., All rights reserved.
PY - 2014
Y1 - 2014
N2 - Subgraph counting forms the basis of many complex network analysis metrics, including motif and anti-motif finding, relative graph let frequency distance, and graph let degree distribution agreements. Determining exact subgraph counts is computationally very expensive. In recent work, we present FASCIA, a shared-memory parallel algorithm and implementation for approximate subgraph counting. FASCIA uses a dynamic programming-based approach and is significantly faster than exhaustive enumeration, while generating high-quality approximations of subgraph counts. However, the memory usage of the dynamic programming step prohibits us from applying FASCIA to very large graphs. In this paper, we introduce a distributed-memory parallelization of FASCIA by partitioning the graph and the dynamic programming table. We discuss a new collective communication scheme to make the dynamic programming step memory-efficient. These optimizations enable scaling to much larger networks than before. We also present a simple parallelization strategy for distributed subgraph counting on smaller networks. The new additions let us use subgraph counts as graph signatures for a large network collection, and we analyze this collection using various subgraph count-based graph analytics.
AB - Subgraph counting forms the basis of many complex network analysis metrics, including motif and anti-motif finding, relative graph let frequency distance, and graph let degree distribution agreements. Determining exact subgraph counts is computationally very expensive. In recent work, we present FASCIA, a shared-memory parallel algorithm and implementation for approximate subgraph counting. FASCIA uses a dynamic programming-based approach and is significantly faster than exhaustive enumeration, while generating high-quality approximations of subgraph counts. However, the memory usage of the dynamic programming step prohibits us from applying FASCIA to very large graphs. In this paper, we introduce a distributed-memory parallelization of FASCIA by partitioning the graph and the dynamic programming table. We discuss a new collective communication scheme to make the dynamic programming step memory-efficient. These optimizations enable scaling to much larger networks than before. We also present a simple parallelization strategy for distributed subgraph counting on smaller networks. The new additions let us use subgraph counts as graph signatures for a large network collection, and we analyze this collection using various subgraph count-based graph analytics.
UR - http://www.scopus.com/inward/record.url?scp=84906674349&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84906674349&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2014.50
DO - 10.1109/IPDPS.2014.50
M3 - Conference contribution
AN - SCOPUS:84906674349
SN - 9780769552071
T3 - Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS
SP - 405
EP - 414
BT - Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium, IPDPS 2014
PB - IEEE Computer Society
T2 - 28th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2014
Y2 - 19 May 2014 through 23 May 2014
ER -