Analysis of Patterns and Trends in COVID-19 Research

Christopher Dornick, Amit Kumar, Scott Seidenberger, Elizabeth Seidle, Partha Mukherjee

Research output: Contribution to journalConference articlepeer-review

5 Scopus citations

Abstract

News and information surrounding the COVID-19 pandemic is ever-evolving and accumulating. Due to the global relevance and importance, it is critical to be able to parse through the available information in an efficient and reliant manner to gauge scientific progression and understandings surrounding COVID-19. In this research, abstracts from a corpus of scientific articles are evaluated using different Natural Language Processing (NLP) techniques, including Term Frequency-Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA), Bidirectional Encoder Representations from Transformers (BERT), and sentiment analysis, to better understand the breadth of extant literature. Results from the analyses show that in the very large corpus datasets, a large group of documents encompasses the overall or dominant general theme. However, the smaller clusters of documents reveal very precise and niche themes. Generalized COVID-19 is the dominant theme present in largest clusters. Smaller clusters include more specific terms (e.g., popular drugs, popular terms, key features/impacts related to COVID). With the resulting clusters, sentiment analysis was run to discover slight fluctuations over time depending on cluster with an overall relatively neutral sentiment. Overall, the precision of the BERT clusters distinguishes niche topics within the large corpus of literature and enables interesting and meaningful text analytics.

Original languageEnglish (US)
Pages (from-to)302-310
Number of pages9
JournalProcedia Computer Science
Volume185
DOIs
StatePublished - 2021
Event2021 Complex Adaptive Systems Conference - Malvern, United States
Duration: Jun 16 2021Jun 18 2021

All Science Journal Classification (ASJC) codes

  • General Computer Science

Cite this