Understanding and predicting retractions of published work

Research output: Contribution to journalConference articlepeer-review


Recent increases in the number of retractions of published papers reflect heightened attention and increased scrutiny in the scientific process motivated, in part, by the replication crisis. These trends motivate computational tools for understanding and assessment of the scholarly record. Here, we sketch the landscape of retracted papers in the Retraction Watch database, a collection of 19k records of published scholarly articles that have been retracted for various reasons (e.g., plagiarism, data error). Using metadata as well as features derived from full-text for a subset of retracted papers in the social and behavioral sciences, we develop a random forest classifier to predict retraction in new samples with 73% accuracy and F1-score of 71%. We believe this study to be the first of its kind to demonstrate the utility of machine learning as a tool for the assessment of retracted work.

Original languageEnglish (US)
JournalCEUR Workshop Proceedings
StatePublished - 2021
Event2021 Workshop on Scientific Document Understanding, SDU 2021 - Virtual, Online
Duration: Feb 9 2021 → …

All Science Journal Classification (ASJC) codes

  • General Computer Science


Dive into the research topics of 'Understanding and predicting retractions of published work'. Together they form a unique fingerprint.

Cite this