TY - GEN
T1 - JOPA
T2 - 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
AU - Chang, Yurui
AU - Cao, Bochuan
AU - Wang, Yujia
AU - Chen, Jinghui
AU - Lin, Lu
N1 - Publisher Copyright:
© 2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - Large Language Models (LLMs) have demonstrated impressive performances in complex text generation tasks. However, the contribution of the input prompt to the generated content still remains obscure to humans, underscoring the necessity of understanding the causality between input and output pairs. Existing works for providing prompt-specific explanation often confine model output to be classification or next-word prediction. Few initial attempts aiming to explain the entire language generation often treat input prompt texts independently, ignoring their combinatorial effects on the followup generation. In this study, we introduce a counterfactual explanation framework based on Joint Prompt Attribution, JoPA, which aims to explain how a few prompt texts collaboratively influences the LLM's complete generation. Particularly, we formulate the task of prompt attribution for generation interpretation as a combinatorial optimization problem, and introduce a probabilistic algorithm to search for the casual input combination in the discrete space. We define and utilize multiple metrics to evaluate the produced explanations, demonstrating both the faithfulness and efficiency of our framework.
AB - Large Language Models (LLMs) have demonstrated impressive performances in complex text generation tasks. However, the contribution of the input prompt to the generated content still remains obscure to humans, underscoring the necessity of understanding the causality between input and output pairs. Existing works for providing prompt-specific explanation often confine model output to be classification or next-word prediction. Few initial attempts aiming to explain the entire language generation often treat input prompt texts independently, ignoring their combinatorial effects on the followup generation. In this study, we introduce a counterfactual explanation framework based on Joint Prompt Attribution, JoPA, which aims to explain how a few prompt texts collaboratively influences the LLM's complete generation. Particularly, we formulate the task of prompt attribution for generation interpretation as a combinatorial optimization problem, and introduce a probabilistic algorithm to search for the casual input combination in the discrete space. We define and utilize multiple metrics to evaluate the produced explanations, demonstrating both the faithfulness and efficiency of our framework.
UR - https://www.scopus.com/pages/publications/105021024164
UR - https://www.scopus.com/pages/publications/105021024164#tab=citedBy
U2 - 10.18653/v1/2025.acl-long.1074
DO - 10.18653/v1/2025.acl-long.1074
M3 - Conference contribution
AN - SCOPUS:105021024164
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 22106
EP - 22122
BT - Long Papers
A2 - Che, Wanxiang
A2 - Nabende, Joyce
A2 - Shutova, Ekaterina
A2 - Pilehvar, Mohammad Taher
PB - Association for Computational Linguistics (ACL)
Y2 - 27 July 2025 through 1 August 2025
ER -