TY - GEN
T1 - Extending Variability-Aware Model Selection with Bias Detection in Machine Learning Projects
AU - Tavares, Cristina
AU - Nascimento, Nathalia
AU - Alencar, Paulo
AU - Cowan, Donald
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Data science projects often involve various machine learning (ML) methods that depend on data, code, and models. One of the key activities in these projects is the selection of a model or algorithm that is appropriate for the data analysis at hand. ML model selection depends on several factors, which include data-related attributes such as sample size, functional requirements such as the prediction algorithm type, and nonfunctional requirements such as performance and bias. However, the factors that influence such selection are often not well understood and explicitly represented. This paper describes ongoing work on extending an adaptive variability-aware model selection method with bias detection in ML projects. The method involves: (i) modeling the variability of the factors that affect model selection using feature models based on heuristics proposed in the literature; (ii) instantiating our variability model with added features related to bias (e.g., bias-related metrics); and (iii) conducting experiments that illustrate the method in a specific case study to illustrate our approach based on a heart failure prediction project. The proposed approach aims to advance the state of the art by making explicit factors that influence model selection, particularly those related to bias, as well as their interactions. The provided representations can transform model selection in ML projects into a non ad hoc, adaptive, and explainable process.
AB - Data science projects often involve various machine learning (ML) methods that depend on data, code, and models. One of the key activities in these projects is the selection of a model or algorithm that is appropriate for the data analysis at hand. ML model selection depends on several factors, which include data-related attributes such as sample size, functional requirements such as the prediction algorithm type, and nonfunctional requirements such as performance and bias. However, the factors that influence such selection are often not well understood and explicitly represented. This paper describes ongoing work on extending an adaptive variability-aware model selection method with bias detection in ML projects. The method involves: (i) modeling the variability of the factors that affect model selection using feature models based on heuristics proposed in the literature; (ii) instantiating our variability model with added features related to bias (e.g., bias-related metrics); and (iii) conducting experiments that illustrate the method in a specific case study to illustrate our approach based on a heart failure prediction project. The proposed approach aims to advance the state of the art by making explicit factors that influence model selection, particularly those related to bias, as well as their interactions. The provided representations can transform model selection in ML projects into a non ad hoc, adaptive, and explainable process.
UR - http://www.scopus.com/inward/record.url?scp=85184976261&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85184976261&partnerID=8YFLogxK
U2 - 10.1109/BigData59044.2023.10386105
DO - 10.1109/BigData59044.2023.10386105
M3 - Conference contribution
AN - SCOPUS:85184976261
T3 - Proceedings - 2023 IEEE International Conference on Big Data, BigData 2023
SP - 2441
EP - 2449
BT - Proceedings - 2023 IEEE International Conference on Big Data, BigData 2023
A2 - He, Jingrui
A2 - Palpanas, Themis
A2 - Hu, Xiaohua
A2 - Cuzzocrea, Alfredo
A2 - Dou, Dejing
A2 - Slezak, Dominik
A2 - Wang, Wei
A2 - Gruca, Aleksandra
A2 - Lin, Jerry Chun-Wei
A2 - Agrawal, Rakesh
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE International Conference on Big Data, BigData 2023
Y2 - 15 December 2023 through 18 December 2023
ER -