TY - GEN
T1 - A Data-Driven Digital Twin for Student Engagement Prediction in e-Learning Systems
AU - Kumi, Sandra
AU - Lomotey, Richard K.
AU - Ray, Madhurima
AU - Cunningham, Emma
AU - Milovich, Stephanie
AU - Deters, Ralph
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Machine Learning (ML) models are increasingly applied to Learning Management System (LMS) data to predict student engagement and performance. LMS data often contain missing values that can be informative. However, existing modeling approaches in education remove or impute missing values, which can lead to inaccurate or biased models. In this paper, we propose the use of digital twins to model students' engagement based on their learning activities on LMS while preserving the missingness patterns. We leveraged synthetic data generators such as Conditional Tabular Generative Adversarial Network (CTGAN), Tabular Variational Autoencoder (TVAE), and RealTabFormer with reversible data transformations to create a virtual replica of students' data. The CTGAN and TVAE generated balanced synthetic data that accurately captured the meaningful patterns of the real data. Moreover, XGBoost trained on a balanced virtual replica of the students' learning activities data obtained an F1-score of above 80% in predicting the students' engagement levels when evaluated on real data with both complete and incomplete entries. Our findings demonstrate how digital twins can be used to address the complexities of data in the education sector, improve the generalization of models, and reduce bias in real-world performance.
AB - Machine Learning (ML) models are increasingly applied to Learning Management System (LMS) data to predict student engagement and performance. LMS data often contain missing values that can be informative. However, existing modeling approaches in education remove or impute missing values, which can lead to inaccurate or biased models. In this paper, we propose the use of digital twins to model students' engagement based on their learning activities on LMS while preserving the missingness patterns. We leveraged synthetic data generators such as Conditional Tabular Generative Adversarial Network (CTGAN), Tabular Variational Autoencoder (TVAE), and RealTabFormer with reversible data transformations to create a virtual replica of students' data. The CTGAN and TVAE generated balanced synthetic data that accurately captured the meaningful patterns of the real data. Moreover, XGBoost trained on a balanced virtual replica of the students' learning activities data obtained an F1-score of above 80% in predicting the students' engagement levels when evaluated on real data with both complete and incomplete entries. Our findings demonstrate how digital twins can be used to address the complexities of data in the education sector, improve the generalization of models, and reduce bias in real-world performance.
UR - https://www.scopus.com/pages/publications/105015429681
UR - https://www.scopus.com/pages/publications/105015429681#tab=citedBy
U2 - 10.1109/AIIoT65859.2025.11105257
DO - 10.1109/AIIoT65859.2025.11105257
M3 - Conference contribution
AN - SCOPUS:105015429681
T3 - 2025 IEEE 6th Annual World AI IoT Congress, AIIoT 2025
SP - 560
EP - 566
BT - 2025 IEEE 6th Annual World AI IoT Congress, AIIoT 2025
A2 - Paul, Rajashree
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 6th IEEE Annual World AI IoT Congress, AIIoT 2025
Y2 - 28 May 2025 through 30 May 2025
ER -