TY - GEN
T1 - Learning on the border
T2 - 16th ACM Conference on Information and Knowledge Management, CIKM 2007
AU - Ertekin, Seyda
AU - Huang, Jian
AU - Bottou, Léon
AU - Lee Giles, C.
PY - 2007
Y1 - 2007
N2 - This paper is concerned with the class imbalance problem which has been known to hinder the learning performance of classification algorithms. The problem occurs when there are significantly less number of observations of the target concept. Various real-world classification tasks, such as medical diagnosis, text categorization and fraud detection suffer from this phenomenon. The standard machine learning algorithms yield better prediction performance with balanced datasets. In this paper, we demonstrate that active learning is capable of solving the class imbalance problem by providing the learner more balanced classes. We also propose an efficient way of selecting informative instances from a smaller pool of samples for active learning which does not necessitate a search through the entire dataset. The proposed method yields an efficient querying system and allows active learning to be applied to very large datasets. Our experimental results show that with an early stopping criteria, active learning achieves a fast solution with competitive prediction performance in imbalanced data classification.
AB - This paper is concerned with the class imbalance problem which has been known to hinder the learning performance of classification algorithms. The problem occurs when there are significantly less number of observations of the target concept. Various real-world classification tasks, such as medical diagnosis, text categorization and fraud detection suffer from this phenomenon. The standard machine learning algorithms yield better prediction performance with balanced datasets. In this paper, we demonstrate that active learning is capable of solving the class imbalance problem by providing the learner more balanced classes. We also propose an efficient way of selecting informative instances from a smaller pool of samples for active learning which does not necessitate a search through the entire dataset. The proposed method yields an efficient querying system and allows active learning to be applied to very large datasets. Our experimental results show that with an early stopping criteria, active learning achieves a fast solution with competitive prediction performance in imbalanced data classification.
UR - http://www.scopus.com/inward/record.url?scp=63449090301&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=63449090301&partnerID=8YFLogxK
U2 - 10.1145/1321440.1321461
DO - 10.1145/1321440.1321461
M3 - Conference contribution
AN - SCOPUS:63449090301
SN - 9781595938039
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 127
EP - 136
BT - CIKM 2007 - Proceedings of the 16th ACM Conference on Information and Knowledge Management
Y2 - 6 November 2007 through 9 November 2007
ER -