Be flexible! learn to debias by sampling and prompting for robust visual question answering

Jin Liu, Chong Feng Fan, Fengyu Zhou, Huijuan Xu

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

Recent studies point out that VQA models tend to rely on the language prior in the training data to answer the questions, which prevents the VQA model from generalization on the out-of-distribution test data. To address this problem, approaches are designed to reduce the language distribution prior effect by constructing negative image–question pairs, while they cannot provide the proper visual reason for answering the question. In this paper, we present a new debiasing framework for VQA by Learning to Sample paired image–question and Prompt for given question (LSP). Specifically, we construct the negative image–question pairs with certain sampling rate to prevent the model from overly relying on the visual shortcut content. Notably, question types provide a strong hint for answering the questions. We utilize question type to constrain the sampling process for negative question–image pairs, and further learn the question type-guided prompt for better question comprehension. Extensive experiments on two public benchmarks, VQA-CP v2 and VQA v2, demonstrate that our model achieves new state-of-the-art results in overall accuracy, i.e., 61.95% and 65.26%.

Original languageEnglish (US)
Article number103296
JournalInformation Processing and Management
Volume60
Issue number3
DOIs
StatePublished - May 2023

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Media Technology
  • Computer Science Applications
  • Management Science and Operations Research
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Be flexible! learn to debias by sampling and prompting for robust visual question answering'. Together they form a unique fingerprint.

Cite this