On the use of similarity search to detect fake scientific papers

Kyle Williams, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Scopus citations

Abstract

Fake scientific papers have recently become of interest within the academic community as a result of the identification of fake papers in the digital libraries of major academic publishers [8]. Detecting and removing these papers is important for many reasons. We describe an investigation into the use of similarity search for detecting fake scientific papers by comparing several methods for signature construction and similarity scoring and describe a pseudo-relevance feedback technique that can be used to improve the effectiveness of these methods. Experiments on a dataset of 40,000 computer science papers show that precision, recall and MAP scores of 0.96, 0.99 and 0.99, respectively, can be achieved, thereby demonstrating the usefulness of similarity search in detecting fake scientific papers and ranking them highly.

Original languageEnglish (US)
Title of host publicationSimilarity Search and Applications - 8th International Conference, SISAP 2015, Proceedings
EditorsRichard Connor, Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro
PublisherSpringer Verlag
Pages332-338
Number of pages7
ISBN (Print)9783319250861
DOIs
StatePublished - 2015
Event8th International Conference on Similarity Search and Applications, SISAP 2015 - Glasgow, United Kingdom
Duration: Oct 12 2015Oct 14 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9371
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other8th International Conference on Similarity Search and Applications, SISAP 2015
Country/TerritoryUnited Kingdom
CityGlasgow
Period10/12/1510/14/15

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'On the use of similarity search to detect fake scientific papers'. Together they form a unique fingerprint.

Cite this