TY - GEN
T1 - ALLIE
T2 - 31st ACM Web Conference, WWW 2022
AU - Cui, Limeng
AU - Tang, Xianfeng
AU - Katariya, Sumeet
AU - Rao, Nikhil
AU - Agrawal, Pallav
AU - Subbian, Karthik
AU - Lee, Dongwon
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/4/25
Y1 - 2022/4/25
N2 - Human labeling is time-consuming and costly. This problem is further exacerbated in extremely imbalanced class label scenarios, such as detecting fraudsters in online websites. Active learning selects the most relevant example for human labelers to improve the model performance at a lower cost. However, existing methods for active learning for graph data often assumes that both data and label distributions are balanced. These assumptions fail in extreme rare-class classification scenarios, such as classifying abusive reviews in an e-commerce website. We propose a novel framework ALLIE to address this challenge of active learning in large-scale imbalanced graph data. In our approach, we efficiently sample from both majority and minority classes using a reinforcement learning agent with imbalance-aware reward function. We employ focal loss in the node classification model in order to focus more on rare class and improve the accuracy of the downstream model. Finally, we use a graph coarsening strategy to reduce the search space of the reinforcement learning agent. We conduct extensive experiments on benchmark graph datasets and real-world e-commerce datasets. ALLIE out-performs state-of-the-art graph-based active learning methods significantly, with up to 10% improvement of F1 score for the positive class. We also validate ALLIE on a proprietary e-commerce graph data by tasking it to detect abuse. Our coarsening strategy reduces the computational time by up to 38% in both proprietary and public datasets.
AB - Human labeling is time-consuming and costly. This problem is further exacerbated in extremely imbalanced class label scenarios, such as detecting fraudsters in online websites. Active learning selects the most relevant example for human labelers to improve the model performance at a lower cost. However, existing methods for active learning for graph data often assumes that both data and label distributions are balanced. These assumptions fail in extreme rare-class classification scenarios, such as classifying abusive reviews in an e-commerce website. We propose a novel framework ALLIE to address this challenge of active learning in large-scale imbalanced graph data. In our approach, we efficiently sample from both majority and minority classes using a reinforcement learning agent with imbalance-aware reward function. We employ focal loss in the node classification model in order to focus more on rare class and improve the accuracy of the downstream model. Finally, we use a graph coarsening strategy to reduce the search space of the reinforcement learning agent. We conduct extensive experiments on benchmark graph datasets and real-world e-commerce datasets. ALLIE out-performs state-of-the-art graph-based active learning methods significantly, with up to 10% improvement of F1 score for the positive class. We also validate ALLIE on a proprietary e-commerce graph data by tasking it to detect abuse. Our coarsening strategy reduces the computational time by up to 38% in both proprietary and public datasets.
UR - http://www.scopus.com/inward/record.url?scp=85129852545&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85129852545&partnerID=8YFLogxK
U2 - 10.1145/3485447.3512229
DO - 10.1145/3485447.3512229
M3 - Conference contribution
AN - SCOPUS:85129852545
T3 - WWW 2022 - Proceedings of the ACM Web Conference 2022
SP - 690
EP - 698
BT - WWW 2022 - Proceedings of the ACM Web Conference 2022
PB - Association for Computing Machinery, Inc
Y2 - 25 April 2022 through 29 April 2022
ER -