SaGCN: Semantic-Aware Graph Calibration Network for Temporal Sentence Grounding

  • Tongbao Chen
  • , Wenmin Wang
  • , Kangrui Han
  • , Huijuan Xu

Research output: Contribution to journalArticlepeer-review

15 Scopus citations

Abstract

Temporal sentence grounding is a challenging task that aims to localize the semantic corresponding segment from the untrimmed video according to the given query language description. Existing methods either utilize a cross-modal matching architecture following a scan-and-rank pipeline or directly predict the probabilities of being the target boundary for each frame based on the entire video content. However, such methods are weak when some of the critical semantic concepts in the query are actually relevant to multiple video segments or the desired video segment contains a query-irrelevant scene due to ignoring query semantic concepts and local and global crossmodal context. In this paper, we propose a novel semanticaware graph calibration network (SaGCN) to address the issues mentioned above. Specifically, we first introduce a semanticaware local relational graph module to capture the inherent relationships among the specific semantic concept relevant local contextual information for fine-grained cross-modal information interactions. Then, a semantic-aware global relational graph module is derived for global contextual information integration and achieving cross-modal alignment. Finally, an attention-based calibration module is designed for eliminating the irrelevant information maintained in the visual modality under the guidance of query description. Extensive experiments verify the effectiveness of our proposed SaGCN on two widely used datasets (Charades-STA and TACoS), in which we achieve significant and consistent improvement compared to the state-of-the-art approaches.

Original languageEnglish (US)
Pages (from-to)3003-3016
Number of pages14
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume33
Issue number6
DOIs
StatePublished - Jun 1 2023

All Science Journal Classification (ASJC) codes

  • Media Technology
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'SaGCN: Semantic-Aware Graph Calibration Network for Temporal Sentence Grounding'. Together they form a unique fingerprint.

Cite this