Cross-Lingual Training of Dense Retrievers for Document Retrieval

Peng Shi, Rui Zhang, He Bai, Jimmy Lin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Scopus citations

Abstract

Dense retrieval has shown great success for passage ranking in English. However, its effectiveness for non-English languages remains unexplored due to limitation in training resources. In this work, we explore different transfer techniques for document ranking from English annotations to non-English languages. Our experiments reveal that zero-shot model-based transfer using mBERT improves search quality. We find that weakly-supervised target language transfer is competitive compared to generation-based target language transfer, which requires translation models.

Original languageEnglish (US)
Title of host publicationMRL 2021 - 1st Workshop on Multilingual Representation Learning, Proceedings of the Conference
EditorsDuygu Ataman, Alexandra Birch, Alexis Conneau, Orhan Firat, Sebastian Ruder, Gozde Gul Sahin
PublisherAssociation for Computational Linguistics (ACL)
Pages251-253
Number of pages3
ISBN (Electronic)9781954085961
DOIs
StatePublished - 2021
Event1st Workshop on Multilingual Representation Learning, MRL 2021 - Punta Cana, Dominican Republic
Duration: Nov 11 2021 → …

Publication series

NameMRL 2021 - 1st Workshop on Multilingual Representation Learning, Proceedings of the Conference

Conference

Conference1st Workshop on Multilingual Representation Learning, MRL 2021
Country/TerritoryDominican Republic
CityPunta Cana
Period11/11/21 → …

All Science Journal Classification (ASJC) codes

  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Cross-Lingual Training of Dense Retrievers for Document Retrieval'. Together they form a unique fingerprint.

Cite this