Can machine learning algorithms predict publication outcomes? A case study of COVID-19 preprints

Sai Koneru, Xin Wei, Jian Wu, Sarah Rajtmajer

Research output: Chapter in Book/Report/Conference proceedingConference contribution


The COVID-19 pandemic catalyzed a large body of scientific work, much of which was completed and disseminated with groundbreaking speed. A significant portion of COVID-related work was posted to preprint servers and COVID-related preprints were more widely cited than their counterparts. This work leverages information retrieval, natural language processing, and supervised learning to predict the subsequent publication, within a year, of COVID-related papers posted to preprint servers in peer-reviewed venues. Our work is inspired by prior work surveying human experts for the same task. We compare the performance of ML and human predictions and discuss the implications of our findings for scientific publishing. The findings demonstrate that the Multi-Layer Perceptron yielded the highest performance, achieving a macro F1 score of 0.674 on the held-out set. This underscores the challenge of accurately predicting the outcomes of the human peer review process. The data and code are available at

Original languageEnglish (US)
Title of host publicationProceedings - 23rd IEEE International Conference on Data Mining Workshops, ICDMW 2023
EditorsJihe Wang, Yi He, Thang N. Dinh, Christan Grant, Meikang Qiu, Witold Pedrycz
PublisherIEEE Computer Society
Number of pages8
ISBN (Electronic)9798350381641
StatePublished - 2023
Event23rd IEEE International Conference on Data Mining Workshops, ICDMW 2023 - Shanghai, China
Duration: Dec 1 2023Dec 4 2023

Publication series

NameIEEE International Conference on Data Mining Workshops, ICDMW
ISSN (Print)2375-9232
ISSN (Electronic)2375-9259


Conference23rd IEEE International Conference on Data Mining Workshops, ICDMW 2023

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Software

Cite this