TY - GEN
T1 - Adaptive Method for Machine Learning Model Selection in Data Science Projects
AU - Tavares, Cristina
AU - Nascimento, Nathalia
AU - Alencar, Paulo
AU - Cowan, Donald
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Data science projects involve a machine learning (ML) process based on data, code, and models that change over time. For example, the datasets may increase in size and allow an ML model that requires larger datasets to be applied. However, the dynamic factors that influence model selection are not well understood and explicitly represented. This paper presents ongoing work on an adaptive method for ML model selection in big data science projects. The proposed method involves (i) identifying the factors that affect model selection based on heuristics proposed in the literature; and (ii) modeling the variability of these factors using a feature diagram and constraints that trigger adaptive reconfiguration, that is, changes in model selection due to changes in the variability factors. The applicability of the method is demonstrated through an illustrative use case. The proposed method can lead to an improved understanding of dynamic factors that influence model selection, how these factors explicitly affect the selection, and how the adaptive factors can be represented and automated. This improved understanding can result in a project model selection process that is less implicit and more efficient, more adaptive and explainable, and ultimately constitute a foundation for the creation of novel dynamic software product lines to support this process.
AB - Data science projects involve a machine learning (ML) process based on data, code, and models that change over time. For example, the datasets may increase in size and allow an ML model that requires larger datasets to be applied. However, the dynamic factors that influence model selection are not well understood and explicitly represented. This paper presents ongoing work on an adaptive method for ML model selection in big data science projects. The proposed method involves (i) identifying the factors that affect model selection based on heuristics proposed in the literature; and (ii) modeling the variability of these factors using a feature diagram and constraints that trigger adaptive reconfiguration, that is, changes in model selection due to changes in the variability factors. The applicability of the method is demonstrated through an illustrative use case. The proposed method can lead to an improved understanding of dynamic factors that influence model selection, how these factors explicitly affect the selection, and how the adaptive factors can be represented and automated. This improved understanding can result in a project model selection process that is less implicit and more efficient, more adaptive and explainable, and ultimately constitute a foundation for the creation of novel dynamic software product lines to support this process.
UR - http://www.scopus.com/inward/record.url?scp=85147950225&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85147950225&partnerID=8YFLogxK
U2 - 10.1109/BigData55660.2022.10020386
DO - 10.1109/BigData55660.2022.10020386
M3 - Conference contribution
AN - SCOPUS:85147950225
T3 - Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022
SP - 2682
EP - 2688
BT - Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022
A2 - Tsumoto, Shusaku
A2 - Ohsawa, Yukio
A2 - Chen, Lei
A2 - Van den Poel, Dirk
A2 - Hu, Xiaohua
A2 - Motomura, Yoichi
A2 - Takagi, Takuya
A2 - Wu, Lingfei
A2 - Xie, Ying
A2 - Abe, Akihiro
A2 - Raghavan, Vijay
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE International Conference on Big Data, Big Data 2022
Y2 - 17 December 2022 through 20 December 2022
ER -