TY - JOUR
T1 - Developing and testing an automated qualitative assistant (AQUA) to support qualitative analysis
AU - Lennon, Robert P.
AU - Fraleigh, Robbie
AU - van Scoy, Lauren J.
AU - Keshaviah, Aparna
AU - Hu, Xindi C.
AU - Snyder, Bethany L.
AU - Miller, Erin L.
AU - Calo, William A.
AU - Zgierska, Aleksandra E.
AU - Griffin, Christopher
N1 - Funding Information:
Funding The dataset used in this work was developed with the support of the Huck Institutes of the Life Sciences (grant number 7601); the Social Science Research Institute at Penn State University (grant number 7601); and the Department of Family and Community Medicine at Penn State College of Medicine (grant number 7601-M). CG’s and RF’s work were supported by the Huck Institute of Life Sciences. Portions of CG’s work were supported by the Defense Advanced Research Project’s Agency SCORE programme (Cooperative Agreement W911NF-19-0272). Competing interests None declared. Patient consent for publication Not applicable.
Publisher Copyright:
© 2021 BMJ Publishing Group. All rights reserved.
PY - 2021/11/25
Y1 - 2021/11/25
N2 - Qualitative research remains underused, in part due to the time and cost of annotating qualitative data (coding). Artificial intelligence (AI) has been suggested as a means to reduce those burdens, and has been used in exploratory studies to reduce the burden of coding. However, methods to date use AI analytical techniques that lack transparency, potentially limiting acceptance of results. We developed an automated qualitative assistant (AQUA) using a semiclassical approach, replacing Latent Semantic Indexing/Latent Dirichlet Allocation with a more transparent graph-theoretic topic extraction and clustering method. Applied to a large dataset of free-text survey responses, AQUA generated unsupervised topic categories and circle hierarchical representations of free-text responses, enabling rapid interpretation of data. When tasked with coding a subset of free-text data into user-defined qualitative categories, AQUA demonstrated intercoder reliability in several multicategory combinations with a Cohen’s kappa comparable to human coders (0.62–0.72), enabling researchers to automate coding on those categories for the entire dataset. The aim of this manuscript is to describe pertinent components of best practices of AI/machine learning (ML)-assisted qualitative methods, illustrating how primary care researchers may use AQUA to rapidly and accurately code large text datasets. The contribution of this article is providing guidance that should increase AI/ML transparency and reproducibility.
AB - Qualitative research remains underused, in part due to the time and cost of annotating qualitative data (coding). Artificial intelligence (AI) has been suggested as a means to reduce those burdens, and has been used in exploratory studies to reduce the burden of coding. However, methods to date use AI analytical techniques that lack transparency, potentially limiting acceptance of results. We developed an automated qualitative assistant (AQUA) using a semiclassical approach, replacing Latent Semantic Indexing/Latent Dirichlet Allocation with a more transparent graph-theoretic topic extraction and clustering method. Applied to a large dataset of free-text survey responses, AQUA generated unsupervised topic categories and circle hierarchical representations of free-text responses, enabling rapid interpretation of data. When tasked with coding a subset of free-text data into user-defined qualitative categories, AQUA demonstrated intercoder reliability in several multicategory combinations with a Cohen’s kappa comparable to human coders (0.62–0.72), enabling researchers to automate coding on those categories for the entire dataset. The aim of this manuscript is to describe pertinent components of best practices of AI/machine learning (ML)-assisted qualitative methods, illustrating how primary care researchers may use AQUA to rapidly and accurately code large text datasets. The contribution of this article is providing guidance that should increase AI/ML transparency and reproducibility.
UR - http://www.scopus.com/inward/record.url?scp=85120654779&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85120654779&partnerID=8YFLogxK
U2 - 10.1136/fmch-2021-001287
DO - 10.1136/fmch-2021-001287
M3 - Article
C2 - 34824135
AN - SCOPUS:85120654779
SN - 2305-6983
VL - 9
JO - Family Medicine and Community Health
JF - Family Medicine and Community Health
M1 - e001287
ER -