Optimizing sentence modeling and selection for document summarization

Wenpeng Yin, Yulong Pei

Research output: Chapter in Book/Report/Conference proceedingConference contribution

89 Scopus citations

Abstract

Extractive document summarization aims to conclude given documents by extracting some salient sentences. Often, it faces two challenges: 1) how to model the information redundancy among candidate sentences; 2) how to select the most appropriate sentences. This paper attempts to build a strong summarizer DivSelect+CNNLM by presenting new algorithms to optimize each of them. Concretely, it proposes CNNLM, a novel neural network language model (NNLM) based on convolutional neural network (CNN), to project sentences into dense distributed representations, then models sentence redundancy by cosine similarity. Afterwards, it formulates the selection process as an optimization problem, constructing a diversified selection process (DivSelect) with the aim of selecting some sentences which have high prestige, meantime, are dis-similar with each other. Experimental results on DUC2002 and DUC2004 benchmark data sets demonstrate the effectiveness of our approach.

Original languageEnglish (US)
Title of host publicationIJCAI 2015 - Proceedings of the 24th International Joint Conference on Artificial Intelligence
EditorsMichael Wooldridge, Qiang Yang
PublisherInternational Joint Conferences on Artificial Intelligence
Pages1383-1389
Number of pages7
ISBN (Electronic)9781577357384
StatePublished - 2015
Event24th International Joint Conference on Artificial Intelligence, IJCAI 2015 - Buenos Aires, Argentina
Duration: Jul 25 2015Jul 31 2015

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
Volume2015-January
ISSN (Print)1045-0823

Other

Other24th International Joint Conference on Artificial Intelligence, IJCAI 2015
Country/TerritoryArgentina
CityBuenos Aires
Period7/25/157/31/15

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Optimizing sentence modeling and selection for document summarization'. Together they form a unique fingerprint.

Cite this