TY - GEN
T1 - LLMs Assist NLP Researchers
T2 - 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024
AU - Du, Jiangshu
AU - Wang, Yibo
AU - Zhao, Wenting
AU - Deng, Zhongfen
AU - Liu, Shuaiqi
AU - Lou, Renze
AU - Zou, Henry Peng
AU - Venkit, Pranav Narayanan
AU - Zhang, Nan
AU - Srinath, Mukund
AU - Zhang, Ranran Haoran
AU - Gupta, Vipul
AU - Li, Yinghui
AU - Li, Tao
AU - Wang, Fei
AU - Liu, Qin
AU - Liu, Tianlin
AU - Gao, Pengzhi
AU - Xia, Congying
AU - Xing, Chen
AU - Jiayang, Cheng
AU - Wang, Zhaowei
AU - Su, Ying
AU - Shah, Raj Sanjay
AU - Guo, Ruohao
AU - Gu, Jing
AU - Li, Haoran
AU - Wei, Kangda
AU - Wang, Zihao
AU - Cheng, Lu
AU - Ranathunga, Surangika
AU - Fang, Meng
AU - Fu, Jie
AU - Liu, Fei
AU - Huang, Ruihong
AU - Blanco, Eduardo
AU - Cao, Yixin
AU - Zhang, Rui
AU - Yu, Philip S.
AU - Yin, Wenpeng
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as they have to spend more time reading, writing, and reviewing papers. This raises the question: how can LLMs potentially assist researchers in alleviating their heavy workload? This study focuses on the topic of LLMs Assist NLP Researchers, particularly examining the effectiveness of LLM in assisting paper (meta-)reviewing and its recognizability. To address this, we constructed the ReviewCritique dataset, which includes two types of information: (i) NLP papers (initial submissions rather than camera-ready) with both human-written and LLM-generated reviews, and (ii) each review comes with “deficiency” labels and corresponding explanations for individual segments, annotated by experts. Using ReviewCritique, this study explores two threads of research questions: (i) “LLMs as Reviewers”, how do reviews generated by LLMs compare with those written by humans in terms of quality and distinguishability? (ii) “LLMs as Metareviewers”, how effectively can LLMs identify potential issues, such as Deficient or unprofessional review segments, within individual paper reviews? To our knowledge, this is the first work to provide such a comprehensive analysis. Our dataset is available at https://github.com/jiangshdd/ReviewCritique.
AB - This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as they have to spend more time reading, writing, and reviewing papers. This raises the question: how can LLMs potentially assist researchers in alleviating their heavy workload? This study focuses on the topic of LLMs Assist NLP Researchers, particularly examining the effectiveness of LLM in assisting paper (meta-)reviewing and its recognizability. To address this, we constructed the ReviewCritique dataset, which includes two types of information: (i) NLP papers (initial submissions rather than camera-ready) with both human-written and LLM-generated reviews, and (ii) each review comes with “deficiency” labels and corresponding explanations for individual segments, annotated by experts. Using ReviewCritique, this study explores two threads of research questions: (i) “LLMs as Reviewers”, how do reviews generated by LLMs compare with those written by humans in terms of quality and distinguishability? (ii) “LLMs as Metareviewers”, how effectively can LLMs identify potential issues, such as Deficient or unprofessional review segments, within individual paper reviews? To our knowledge, this is the first work to provide such a comprehensive analysis. Our dataset is available at https://github.com/jiangshdd/ReviewCritique.
UR - https://www.scopus.com/pages/publications/85208157857
UR - https://www.scopus.com/pages/publications/85208157857#tab=citedBy
U2 - 10.18653/v1/2024.emnlp-main.292
DO - 10.18653/v1/2024.emnlp-main.292
M3 - Conference contribution
AN - SCOPUS:85208157857
T3 - EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
SP - 5081
EP - 5099
BT - EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
A2 - Al-Onaizan, Yaser
A2 - Bansal, Mohit
A2 - Chen, Yun-Nung
PB - Association for Computational Linguistics (ACL)
Y2 - 12 November 2024 through 16 November 2024
ER -