TY - JOUR
T1 - TopoImb
T2 - 1st Learning on Graphs Conference, LOG 2022
AU - Zhao, Tianxiang
AU - Luo, Dongsheng
AU - Zhang, Xiang
AU - Wang, Suhang
N1 - Funding Information:
This material is based upon work supported by, or in part by, the National Science Foundation under grants number IIS-1707548 and IIS-1909702, the Army Research Office under grant number W911NF21-1-0198, and Department of Homeland Security CINA under grant number E205949D. The findings and conclusions in this paper do not necessarily reflect the view of the funding agency.
Publisher Copyright:
© 2022 Proceedings of Machine Learning Research. All rights reserved.
PY - 2022
Y1 - 2022
N2 - Graph serves as a powerful tool for modeling data that has an underlying structure in non-Euclidean space, by encoding relations as edges and entities as nodes. Despite developments in learning from graph-structured data over the years, one obstacle persists: graph imbalance. Although several attempts have been made to target this problem, they are limited to considering only class-level imbalance. We argue that for graphs, the imbalance is likely to exist at the sub-class level in the form of infrequent topological motifs. Due to the flexibility of topology structures, graphs could be highly diverse, and learning a generalizable classification boundary would be difficult. Therefore, several majority topology groups may dominate the learning process, rendering others under-represented. To address this problem, we propose a new framework TopoImb and design (1) a topology extractor, which automatically identifies the topology group for each instance with explicit memory cells, (2) a training modulator, which modulates the learning process of the target GNN model to prevent the case of topology-group-wise under-representation. TopoImb can be used as a key component in GNN models to improve their performances under the data imbalance setting. Analyses on both topology-level imbalance and the proposed TopoImb are provided theoretically, and we empirically verify its effectiveness with both node-level and graph-level classification as the target tasks.
AB - Graph serves as a powerful tool for modeling data that has an underlying structure in non-Euclidean space, by encoding relations as edges and entities as nodes. Despite developments in learning from graph-structured data over the years, one obstacle persists: graph imbalance. Although several attempts have been made to target this problem, they are limited to considering only class-level imbalance. We argue that for graphs, the imbalance is likely to exist at the sub-class level in the form of infrequent topological motifs. Due to the flexibility of topology structures, graphs could be highly diverse, and learning a generalizable classification boundary would be difficult. Therefore, several majority topology groups may dominate the learning process, rendering others under-represented. To address this problem, we propose a new framework TopoImb and design (1) a topology extractor, which automatically identifies the topology group for each instance with explicit memory cells, (2) a training modulator, which modulates the learning process of the target GNN model to prevent the case of topology-group-wise under-representation. TopoImb can be used as a key component in GNN models to improve their performances under the data imbalance setting. Analyses on both topology-level imbalance and the proposed TopoImb are provided theoretically, and we empirically verify its effectiveness with both node-level and graph-level classification as the target tasks.
UR - http://www.scopus.com/inward/record.url?scp=85164535664&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85164535664&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85164535664
SN - 2640-3498
VL - 198
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
Y2 - 9 December 2022 through 12 December 2022
ER -