Automatic analysis of thematic structure in written English

Kwanghyun Park, Xiaofei Lu

Research output: Contribution to journalArticlepeer-review

3 Scopus citations


This paper proposes and describes a computational system for the automatic analysis of thematic structure, as defined in Systemic Functional Linguistics, in written English. The system takes an English text as input and produces as output an analysis of the thematic structure of each sentence in the text. The system is evaluated using data from The Wall Street Journal section of the Penn Treebank (Marcus et al. 1993) and the British Academic Written English corpus (Gardner & Nesi 2013). An experiment using these data shows that the system achieves a high degree of reliability in regard to both identifying theme-rheme boundaries and determining several of the linguistic properties of the identified themes, including syntactic nodes, theme function, markedness, mood types, and theme roles. To illustrate how the system is used, we describe an example application designed to compare collections of novice and expert academic writing in terms of thematic structure.

Original languageEnglish (US)
Pages (from-to)81-101
Number of pages21
JournalInternational Journal of Corpus Linguistics
Issue number1
StatePublished - Jan 1 2015

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Linguistics and Language


Dive into the research topics of 'Automatic analysis of thematic structure in written English'. Together they form a unique fingerprint.

Cite this