TY - GEN
T1 - Improving low-resource cross-lingual document retrieval by reranking with deep bilingual representations
AU - Zhang, Rui
AU - Westerfield, Caitlin
AU - Shim, Sungrok
AU - Bingham, Garrett
AU - Fabbri, Alexander
AU - Hu, William
AU - Verma, Neha
AU - Radev, Dragomir
N1 - Funding Information:
We thank Petra Galusˇcˇáková, Douglas W. Oard, Efsun Kayi, Suraj Nair, Han-Chin Shing, and Joseph Barrow for their helpful discussion and feedback. This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via contract # FA8650-17-C-9117. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein.
Publisher Copyright:
© 2019 Association for Computational Linguistics
PY - 2020
Y1 - 2020
N2 - In this paper, we propose to boost low-resource cross-lingual document retrieval performance with deep bilingual query-document representations. We match queries and documents in both source and target languages with four components, each of which is implemented as a term interaction-based deep neural network with cross-lingual word embeddings as input. By including query likelihood scores as extra features, our model effectively learns to rerank the retrieved documents by using a small number of relevance labels for low-resource language pairs. Due to the shared cross-lingual word embedding space, the model can also be directly applied to another language pair without any training label. Experimental results on the MATERIAL dataset show that our model outperforms the competitive translation-based baselines on English-Swahili, English-Tagalog, and English-Somali cross-lingual information retrieval tasks.
AB - In this paper, we propose to boost low-resource cross-lingual document retrieval performance with deep bilingual query-document representations. We match queries and documents in both source and target languages with four components, each of which is implemented as a term interaction-based deep neural network with cross-lingual word embeddings as input. By including query likelihood scores as extra features, our model effectively learns to rerank the retrieved documents by using a small number of relevance labels for low-resource language pairs. Due to the shared cross-lingual word embedding space, the model can also be directly applied to another language pair without any training label. Experimental results on the MATERIAL dataset show that our model outperforms the competitive translation-based baselines on English-Swahili, English-Tagalog, and English-Somali cross-lingual information retrieval tasks.
UR - http://www.scopus.com/inward/record.url?scp=85084089866&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084089866&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85084089866
T3 - ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
SP - 3173
EP - 3179
BT - ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
T2 - 57th Annual Meeting of the Association for Computational Linguistics, ACL 2019
Y2 - 28 July 2019 through 2 August 2019
ER -