Joint event detection and description in continuous video streams

Huijuan Xu, Boyang Li, Vasili Ramanishka, Leonid Sigal, Kate Saenko

Research output: Chapter in Book/Report/Conference proceedingConference contribution

33 Scopus citations

Abstract

Dense video captioning involves first localizing events in a video and then generating captions for the identified events. We present the Joint Event Detection and Description Network (JEDDi-Net) for solving this task in an end-to-end fashion, which encodes the input video stream with three-dimensional convolutional layers, proposes variable- length temporal events based on pooled features, and then uses a two-level hierarchical LSTM module with context modeling to transcribe the event proposals into captions. We show the effectiveness of our proposed JEDDi-Net on the large-scale ActivityNet Captions dataset.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 IEEE Winter Conference on Applications of Computer Vision Workshops, WACVW 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages25-26
Number of pages2
ISBN (Electronic)9781728113920
DOIs
StatePublished - Feb 8 2019
Event19th IEEE Winter Conference on Applications of Computer Vision Workshops, WACVW 2019 - Waikoloa Village, United States
Duration: Jan 7 2019Jan 11 2019

Publication series

NameProceedings - 2019 IEEE Winter Conference on Applications of Computer Vision Workshops, WACVW 2019

Conference

Conference19th IEEE Winter Conference on Applications of Computer Vision Workshops, WACVW 2019
Country/TerritoryUnited States
CityWaikoloa Village
Period1/7/191/11/19

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Cite this