An analysis of form and function of a research article between and within publishers and journals

Sarah Nathan, Leah Haynes, Jessica Meyer, Josh Sumner, Cynthia Hudson-Vitale, Leslie D. McIntosh

Research output: Contribution to journalArticlepeer-review


The identification and subsequent analysis of research articles for machine learning and natural language processing is a complicated task given the lack of consistent article organization principles and heading naming conventions across publishers and journals. Given this, an understanding of how research articles organizationally follow a common function and their use of various heading terms, or forms, is a critical step in applying machine learning techniques for data and information mining across a corpus of articles. To address this need, the authors developed and implemented an article heading form and function analysis across 12 publishers including both research articles and nonresearch articles. Our aim was to (a) identify each of the labeled sections used by research articles, define these sections based on their rhetorical function, and determine frequency of use; (b) within the given data set, determine all of the alternative labels used to identify these sections; and (c) determine whether these sections can be used to consistently determine (1) whether an article is a true research article, or (2) whether an article is not a research article. The results indicated wide variability in the organization of research articles with 24 common sections, known by 186 different names both within and across publishing houses.

Original languageEnglish (US)
Pages (from-to)643-661
Number of pages19
JournalQuantitative Science Studies
Issue number2
StatePublished - Jul 15 2021

All Science Journal Classification (ASJC) codes

  • Cultural Studies
  • Analysis
  • Library and Information Sciences
  • Numerical Analysis


Dive into the research topics of 'An analysis of form and function of a research article between and within publishers and journals'. Together they form a unique fingerprint.

Cite this