TY - JOUR
T1 - Analysis of Patterns and Trends in COVID-19 Research
AU - Dornick, Christopher
AU - Kumar, Amit
AU - Seidenberger, Scott
AU - Seidle, Elizabeth
AU - Mukherjee, Partha
N1 - Publisher Copyright:
© 2021 Elsevier B.V.. All rights reserved.
PY - 2021
Y1 - 2021
N2 - News and information surrounding the COVID-19 pandemic is ever-evolving and accumulating. Due to the global relevance and importance, it is critical to be able to parse through the available information in an efficient and reliant manner to gauge scientific progression and understandings surrounding COVID-19. In this research, abstracts from a corpus of scientific articles are evaluated using different Natural Language Processing (NLP) techniques, including Term Frequency-Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA), Bidirectional Encoder Representations from Transformers (BERT), and sentiment analysis, to better understand the breadth of extant literature. Results from the analyses show that in the very large corpus datasets, a large group of documents encompasses the overall or dominant general theme. However, the smaller clusters of documents reveal very precise and niche themes. Generalized COVID-19 is the dominant theme present in largest clusters. Smaller clusters include more specific terms (e.g., popular drugs, popular terms, key features/impacts related to COVID). With the resulting clusters, sentiment analysis was run to discover slight fluctuations over time depending on cluster with an overall relatively neutral sentiment. Overall, the precision of the BERT clusters distinguishes niche topics within the large corpus of literature and enables interesting and meaningful text analytics.
AB - News and information surrounding the COVID-19 pandemic is ever-evolving and accumulating. Due to the global relevance and importance, it is critical to be able to parse through the available information in an efficient and reliant manner to gauge scientific progression and understandings surrounding COVID-19. In this research, abstracts from a corpus of scientific articles are evaluated using different Natural Language Processing (NLP) techniques, including Term Frequency-Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA), Bidirectional Encoder Representations from Transformers (BERT), and sentiment analysis, to better understand the breadth of extant literature. Results from the analyses show that in the very large corpus datasets, a large group of documents encompasses the overall or dominant general theme. However, the smaller clusters of documents reveal very precise and niche themes. Generalized COVID-19 is the dominant theme present in largest clusters. Smaller clusters include more specific terms (e.g., popular drugs, popular terms, key features/impacts related to COVID). With the resulting clusters, sentiment analysis was run to discover slight fluctuations over time depending on cluster with an overall relatively neutral sentiment. Overall, the precision of the BERT clusters distinguishes niche topics within the large corpus of literature and enables interesting and meaningful text analytics.
UR - http://www.scopus.com/inward/record.url?scp=85112700545&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85112700545&partnerID=8YFLogxK
U2 - 10.1016/j.procs.2021.05.032
DO - 10.1016/j.procs.2021.05.032
M3 - Conference article
AN - SCOPUS:85112700545
SN - 1877-0509
VL - 185
SP - 302
EP - 310
JO - Procedia Computer Science
JF - Procedia Computer Science
T2 - 2021 Complex Adaptive Systems Conference
Y2 - 16 June 2021 through 18 June 2021
ER -