TY - GEN
T1 - Fast approximate subgraph counting and enumeration
AU - Slota, George M.
AU - Madduri, Kamesh
PY - 2013
Y1 - 2013
N2 - We present a new shared-memory parallel algorithm and implementation called FASCIA for the problems of approximate subgraph counting and subgraph enumeration. The problem of subgraph counting refers to determining the frequency of occurrence of a given subgraph (or template) within a large network. This is a key graph analytic with applications in various domains. In bioinformatics, subgraph counting is used to detect and characterize local structure (motifs) in protein interaction networks. Exhaustive enumeration and exact counting is extremely compute-intensive, with running time growing exponentially with the number of vertices in the template. In this work, we apply the color coding technique to determine approximate counts of non-induced occurrences of the subgraph in the original network. Color coding gives a fixed-parameter algorithm for this problem, using a dynamic programming-based counting approach. Our new contributions are a multilevel shared-memory parallelization of the counting scheme and several optimizations to reduce the memory footprint. We show that approximate counts can be obtained for templates with up to 12 vertices, on networks with up to millions of vertices and edges. Prior work on this problem has only considered out-of-core parallelization on distributed platforms. With our new counting scheme, data layout optimizations, and multicore parallelism, we demonstrate a significant speedup over the current state-of-the-art for subgraph counting.
AB - We present a new shared-memory parallel algorithm and implementation called FASCIA for the problems of approximate subgraph counting and subgraph enumeration. The problem of subgraph counting refers to determining the frequency of occurrence of a given subgraph (or template) within a large network. This is a key graph analytic with applications in various domains. In bioinformatics, subgraph counting is used to detect and characterize local structure (motifs) in protein interaction networks. Exhaustive enumeration and exact counting is extremely compute-intensive, with running time growing exponentially with the number of vertices in the template. In this work, we apply the color coding technique to determine approximate counts of non-induced occurrences of the subgraph in the original network. Color coding gives a fixed-parameter algorithm for this problem, using a dynamic programming-based counting approach. Our new contributions are a multilevel shared-memory parallelization of the counting scheme and several optimizations to reduce the memory footprint. We show that approximate counts can be obtained for templates with up to 12 vertices, on networks with up to millions of vertices and edges. Prior work on this problem has only considered out-of-core parallelization on distributed platforms. With our new counting scheme, data layout optimizations, and multicore parallelism, we demonstrate a significant speedup over the current state-of-the-art for subgraph counting.
UR - http://www.scopus.com/inward/record.url?scp=84893250713&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84893250713&partnerID=8YFLogxK
U2 - 10.1109/ICPP.2013.30
DO - 10.1109/ICPP.2013.30
M3 - Conference contribution
AN - SCOPUS:84893250713
SN - 9780769551173
T3 - Proceedings of the International Conference on Parallel Processing
SP - 210
EP - 219
BT - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 42nd Annual International Conference on Parallel Processing, ICPP 2013
Y2 - 1 October 2013 through 4 October 2013
ER -