TY - GEN
T1 - Distribution Consistency based Self-Training for Graph Neural Networks with Sparse Labels
AU - Wang, Fali
AU - Zhao, Tianxiang
AU - Wang, Suhang
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024/3/4
Y1 - 2024/3/4
N2 - Few-shot node classification poses a significant challenge for Graph Neural Networks (GNNs) due to insufficient supervision and potential distribution shifts between labeled and unlabeled nodes. Self-Training has emerged as a widely popular framework to leverage the abundance of unlabeled data, which expands the training set by assigning pseudo-labels to selected unlabeled nodes. Efforts have been made to develop various selection strategies based on confidence, information gain, etc. However, none of these methods takes into account the distribution shift between the training and testing node sets. The pseudo-labeling step may amplify this shift and even introduce new ones, hindering the effectiveness of self-Training. Therefore, in this work, we explore the potential of explicitly bridging the distribution shift between the expanded training set and test set during self-Training. To this end, we propose a novel Distribution-Consistent Graph Self-Training (DC-GST) framework to identify pseudo-labeled nodes that both are informative and capable of redeeming the distribution discrepancy and formulate it as a differentiable optimization task. A distribution-shift-Aware edge predictor is further adopted to augment the graph and increase the model's generalizability in assigning pseudo labels. We evaluate our proposed method on four publicly available benchmark datasets and extensive experiments demonstrate that our framework consistently outperforms state-of-The-Art baselines.
AB - Few-shot node classification poses a significant challenge for Graph Neural Networks (GNNs) due to insufficient supervision and potential distribution shifts between labeled and unlabeled nodes. Self-Training has emerged as a widely popular framework to leverage the abundance of unlabeled data, which expands the training set by assigning pseudo-labels to selected unlabeled nodes. Efforts have been made to develop various selection strategies based on confidence, information gain, etc. However, none of these methods takes into account the distribution shift between the training and testing node sets. The pseudo-labeling step may amplify this shift and even introduce new ones, hindering the effectiveness of self-Training. Therefore, in this work, we explore the potential of explicitly bridging the distribution shift between the expanded training set and test set during self-Training. To this end, we propose a novel Distribution-Consistent Graph Self-Training (DC-GST) framework to identify pseudo-labeled nodes that both are informative and capable of redeeming the distribution discrepancy and formulate it as a differentiable optimization task. A distribution-shift-Aware edge predictor is further adopted to augment the graph and increase the model's generalizability in assigning pseudo labels. We evaluate our proposed method on four publicly available benchmark datasets and extensive experiments demonstrate that our framework consistently outperforms state-of-The-Art baselines.
UR - http://www.scopus.com/inward/record.url?scp=85186372571&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85186372571&partnerID=8YFLogxK
U2 - 10.1145/3616855.3635793
DO - 10.1145/3616855.3635793
M3 - Conference contribution
AN - SCOPUS:85186372571
T3 - WSDM 2024 - Proceedings of the 17th ACM International Conference on Web Search and Data Mining
SP - 712
EP - 720
BT - WSDM 2024 - Proceedings of the 17th ACM International Conference on Web Search and Data Mining
PB - Association for Computing Machinery, Inc
T2 - 17th ACM International Conference on Web Search and Data Mining, WSDM 2024
Y2 - 4 March 2024 through 8 March 2024
ER -