Discourse Segmentation by Human and Automated Means

Rebecca J. Passonneau, Diane J. Litman

Research output: Contribution to journalArticlepeer-review

133 Scopus citations


The need to model the relation between discourse structure and linguistic features of utterances is almost universally acknowledged in the literature on discourse. However, there is only weak consensus on what the units of discourse structure are, or the criteria for recognizing and generating them. We present quantitative results of a two-part study using a corpus of spontaneous, narrative monologues. The first part of our paper presents a method for empirically validating multiutterance units referred to as discourse segments. We report highly significant results of segmentations performed by naive subjects, where a commonsense notion of speaker intention is the segmentation criterion. In the second part of our study, data abstracted from the subjects' segmentations serve as a target for evaluating two sets of algorithms that use utterance features to perform segmentation. On the first algorithm set, we evaluate and compare the correlation of discourse segmentation with three types of linguistic cues (referential noun phrases, cue words, and pauses). We then develop a second set using two methods: error analysis and machine learning. Testing the new algorithms on a new data set shows that when multiple sources of linguistic knowledge are used concurrently, algorithm performance improves.

Original languageEnglish (US)
Pages (from-to)103-139
Number of pages37
JournalComputational Linguistics
Issue number1
StatePublished - Mar 1997

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Linguistics and Language
  • Computer Science Applications
  • Artificial Intelligence


Dive into the research topics of 'Discourse Segmentation by Human and Automated Means'. Together they form a unique fingerprint.

Cite this