TY - GEN
T1 - Active Learning for Graphs with Noisy Structures
AU - Chi, Hongliang
AU - Qi, Cong
AU - Wang, Suhang
AU - Ma, Yao
N1 - Publisher Copyright:
Copyright © 2024 by SIAM.
PY - 2024
Y1 - 2024
N2 - Graph Neural Networks (GNNs) have seen significant success in tasks such as node classification, largely contingent upon the availability of sufficient labeled nodes. Yet, the excessive cost of labeling large-scale graphs led to a focus on active learning on graphs, which aims for effective data selection to maximize downstream model performance. Notably, most existing methods assume reliable graph topology, while real-world scenarios often present noisy graphs. Given this, designing a successful active learning framework for noisy graphs is highly needed but challenging, as selecting data for labeling and obtaining a clean graph are two tasks naturally interdependent: selecting high-quality data requires clean graph structure while cleaning noisy graph structure requires sufficient labeled data. Considering the complexity mentioned above, we propose an active learning framework, GALClean, which has been specifically designed to adopt an iterative approach for conducting both data selection and graph purification simultaneously with best information learned from the prior iteration. Importantly, we summarize GALClean as an instance of the Expectation-Maximization algorithm, which provides a theoretical understanding of its design and mechanisms. This theory naturally leads to an enhanced version, GALClean+. Extensive experiments have demonstrated the effectiveness and robustness of our proposed method across various types and levels of noisy graphs.
AB - Graph Neural Networks (GNNs) have seen significant success in tasks such as node classification, largely contingent upon the availability of sufficient labeled nodes. Yet, the excessive cost of labeling large-scale graphs led to a focus on active learning on graphs, which aims for effective data selection to maximize downstream model performance. Notably, most existing methods assume reliable graph topology, while real-world scenarios often present noisy graphs. Given this, designing a successful active learning framework for noisy graphs is highly needed but challenging, as selecting data for labeling and obtaining a clean graph are two tasks naturally interdependent: selecting high-quality data requires clean graph structure while cleaning noisy graph structure requires sufficient labeled data. Considering the complexity mentioned above, we propose an active learning framework, GALClean, which has been specifically designed to adopt an iterative approach for conducting both data selection and graph purification simultaneously with best information learned from the prior iteration. Importantly, we summarize GALClean as an instance of the Expectation-Maximization algorithm, which provides a theoretical understanding of its design and mechanisms. This theory naturally leads to an enhanced version, GALClean+. Extensive experiments have demonstrated the effectiveness and robustness of our proposed method across various types and levels of noisy graphs.
UR - http://www.scopus.com/inward/record.url?scp=85193475001&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85193475001&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85193475001
T3 - Proceedings of the 2024 SIAM International Conference on Data Mining, SDM 2024
SP - 262
EP - 270
BT - Proceedings of the 2024 SIAM International Conference on Data Mining, SDM 2024
A2 - Shekhar, Shashi
A2 - Papalexakis, Vagelis
A2 - Gao, Jing
A2 - Jiang, Zhe
A2 - Riondato, Matteo
PB - Society for Industrial and Applied Mathematics Publications
T2 - 2024 SIAM International Conference on Data Mining, SDM 2024
Y2 - 18 April 2024 through 20 April 2024
ER -