This paper elaborates on our submission to the ARQMath track at CLEF 2020. Our primary run for the main Task-1: Question Answering uses a two-stage retrieval technique in which the first stage is a fusion of traditional BM25 scoring and tf-idf with cosine similarity-based retrieval while the second stage is a finer re-ranking technique using contextualized embeddings. For the re-ranking we use a pre-trained roberta-base model (110 million parameters) to make the language model more math-aware. Our approach achieves a higher NDCG0 score than the baseline, while our MAP and P@10 scores are competitive, performing better than the best submission (MathDowsers) for text and text+formula dependent topics.
|CEUR Workshop Proceedings
|Published - 2020
|11th Conference and Labs of the Evaluation Forum, CLEF 2020 - Thessaloniki, Greece
Duration: Sep 22 2020 → Sep 25 2020
All Science Journal Classification (ASJC) codes
- General Computer Science