TY - GEN
T1 - Can You Answer This? - Exploring Zero-Shot QA Generalization Capabilities in Large Language Models
AU - Sengupta, Saptarshi
AU - Ghosh, Shreya
AU - Nakov, Preslav
AU - Mitra, Prasenjit
N1 - Publisher Copyright:
Copyright © 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2023/6/27
Y1 - 2023/6/27
N2 - The buzz around Transformer-based Language Models (TLMs) such as BERT, RoBERTa, etc. is well-founded owing to their impressive results on an array of tasks. However, when applied to areas needing specialized knowledge (closed-domain), such as medical, finance, etc. their performance takes drastic hits, sometimes more than their older recurrent/convolutional counterparts. In this paper, we explore zero-shot capabilities of large language models for extractive Question Answering. Our objective is to examine the performance change in the face of domain drift, i.e., when the target domain data is vastly different in semantic and statistical properties from the source domain, in an attempt to explain the subsequent behavior. To this end, we present two studies in this paper while planning further experiments later down the road. Our findings indicate flaws in the current generation of TLMs limiting their performance on closed-domain tasks.
AB - The buzz around Transformer-based Language Models (TLMs) such as BERT, RoBERTa, etc. is well-founded owing to their impressive results on an array of tasks. However, when applied to areas needing specialized knowledge (closed-domain), such as medical, finance, etc. their performance takes drastic hits, sometimes more than their older recurrent/convolutional counterparts. In this paper, we explore zero-shot capabilities of large language models for extractive Question Answering. Our objective is to examine the performance change in the face of domain drift, i.e., when the target domain data is vastly different in semantic and statistical properties from the source domain, in an attempt to explain the subsequent behavior. To this end, we present two studies in this paper while planning further experiments later down the road. Our findings indicate flaws in the current generation of TLMs limiting their performance on closed-domain tasks.
UR - https://www.scopus.com/pages/publications/85168255654
UR - https://www.scopus.com/pages/publications/85168255654#tab=citedBy
U2 - 10.1609/aaai.v37i13.27019
DO - 10.1609/aaai.v37i13.27019
M3 - Conference contribution
AN - SCOPUS:85168255654
T3 - Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023
SP - 16318
EP - 16319
BT - AAAI-23 Special Programs, IAAI-23, EAAI-23, Student Papers and Demonstrations
A2 - Williams, Brian
A2 - Chen, Yiling
A2 - Neville, Jennifer
PB - AAAI press
T2 - 37th AAAI Conference on Artificial Intelligence, AAAI 2023
Y2 - 7 February 2023 through 14 February 2023
ER -