TY - GEN
T1 - Classification of Breast Cancer Risk Factors Using Several Resampling Approaches
AU - Kabir, Md Faisal
AU - Ludwig, Simone
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - Breast cancer is the most common cancer in women worldwide and the second most common cancer overall. Predicting the risk of breast cancer occurrence is an important challenge for clinical oncologists as it has direct influence in daily practice and clinical service. Classification is one of the supervised learning models that is applied in medical domains. Achieving better performance on real data that contains imbalance characteristics is a very challenging task. Machine learning researchers have been using various techniques to obtain higher accuracy, generally by correctly identifying majority class samples while ignoring the instances of the minority class. However, in most of the cases the concept of the minority class instances usually is of higher interest than the majority class. In this research, we applied three different classification techniques on a real world breast cancer risk factors data set. First, we applied specified classification techniques on breast cancer data without applying any resampling technique. Second, since the data is imbalanced meaning data has an unequal distribution between the classes, we applied several resampling methods to get better performance before applying the classifiers. The experimental results show significant improvement on using a resampling method as compared to applying no resampling technique, particularly for the minority class.
AB - Breast cancer is the most common cancer in women worldwide and the second most common cancer overall. Predicting the risk of breast cancer occurrence is an important challenge for clinical oncologists as it has direct influence in daily practice and clinical service. Classification is one of the supervised learning models that is applied in medical domains. Achieving better performance on real data that contains imbalance characteristics is a very challenging task. Machine learning researchers have been using various techniques to obtain higher accuracy, generally by correctly identifying majority class samples while ignoring the instances of the minority class. However, in most of the cases the concept of the minority class instances usually is of higher interest than the majority class. In this research, we applied three different classification techniques on a real world breast cancer risk factors data set. First, we applied specified classification techniques on breast cancer data without applying any resampling technique. Second, since the data is imbalanced meaning data has an unequal distribution between the classes, we applied several resampling methods to get better performance before applying the classifiers. The experimental results show significant improvement on using a resampling method as compared to applying no resampling technique, particularly for the minority class.
UR - http://www.scopus.com/inward/record.url?scp=85062233293&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062233293&partnerID=8YFLogxK
U2 - 10.1109/ICMLA.2018.00202
DO - 10.1109/ICMLA.2018.00202
M3 - Conference contribution
AN - SCOPUS:85062233293
T3 - Proceedings - 17th IEEE International Conference on Machine Learning and Applications, ICMLA 2018
SP - 1243
EP - 1248
BT - Proceedings - 17th IEEE International Conference on Machine Learning and Applications, ICMLA 2018
A2 - Wani, M. Arif
A2 - Kantardzic, Mehmed
A2 - Sayed-Mouchaweh, Moamar
A2 - Gama, Joao
A2 - Lughofer, Edwin
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 17th IEEE International Conference on Machine Learning and Applications, ICMLA 2018
Y2 - 17 December 2018 through 20 December 2018
ER -