Natural Language Processing for Theoretical Framework Selection in Engineering Education Research

Catherine G.P. Berdanier, Christopher M. McComb, Weiwei Zhu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations


This research paper presents recent work exploring the power of natural language processing (NLP) methods applied to qualitative engineering education data. As NLP and other machine learning methods are developed for qualitative data, it is important to prioritize the role that theory plays in rigorous qualitative research, where the selection of a theoretical framework serves as the lens by which the research project is framed, results are analyzed, and findings are brought to light. Indeed, the view from a different theoretical lens can highlight novel or new findings. In this work, we seek to explore the viability of NLP methods for helping researchers select appropriate frameworks. In this work, we present our method to train a Python-based NLP algorithm to analyze an existing data set of interview data using one theoretical lens: Community of Practice theory, an oft-used theory in graduate education literature, which is the topic of the interview corpus to investigate. We present and test two methods for developing dictionaries by which to train the algorithm: An expert-curated dictionary and a machine-generated dictionary compiled by mining the theoretical framework sections of published literature employing Community of Practice theory. We apply these two dictionaries to analyze a corpus of 54 interview transcripts investigating graduate engineering attrition. The high dimensional data from NLP can be compared using Principal Component Analysis (PCA) visualization and pairwise distance plots to determine which method results in the most well-defined structure indicating agreement between the dictionary and the corpus of interview transcripts. In the discussion, we highlight opportunities for using these automated methods to help researchers with qualitative data analysis and warn against potential dangers and ethical ramifications for using machine learning and NLP for social science data. This work will have impact on the disciplinary communities working to embed computational language-based methods into engineering education research, and for the qualitative methods communities across social science and education disciplines.

Original languageEnglish (US)
Title of host publication2020 IEEE Frontiers in Education Conference, FIE 2020 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728189611
StatePublished - Oct 21 2020
Event2020 IEEE Frontiers in Education Conference, FIE 2020 - Uppsala, Sweden
Duration: Oct 21 2020Oct 24 2020

Publication series

NameProceedings - Frontiers in Education Conference, FIE
ISSN (Print)1539-4565


Conference2020 IEEE Frontiers in Education Conference, FIE 2020

All Science Journal Classification (ASJC) codes

  • Software
  • Education
  • Computer Science Applications


Dive into the research topics of 'Natural Language Processing for Theoretical Framework Selection in Engineering Education Research'. Together they form a unique fingerprint.

Cite this