TY - JOUR
T1 - Finding AI's Faults with AAR/AI
T2 - An Empirical Study
AU - Khanna, Roli
AU - Dodge, Jonathan
AU - Anderson, Andrew
AU - Dikkala, Rupika
AU - Irvine, Jed
AU - Shureih, Zeyad
AU - Lam, Kin Ho
AU - Matthews, Caleb R.
AU - Lin, Zhengxian
AU - Kahng, Minsuk
AU - Fern, Alan
AU - Burnett, Margaret
N1 - Publisher Copyright:
© 2022 Association for Computing Machinery.
PY - 2022/3
Y1 - 2022/3
N2 - Would you allow an AI agent to make decisions on your behalf? If the answer is "not always,"the next question becomes "in what circumstances"? Answering this question requires human users to be able to assess an AI agent- A nd not just with overall pass/fail assessments or statistics. Here users need to be able to localize an agent's bugs so that they can determine when they are willing to rely on the agent and when they are not. After-Action Review for AI (AAR/AI), a new AI assessment process for integration with Explainable AI systems, aims to support human users in this endeavor, and in this article we empirically investigate AAR/AI's effectiveness with domain-knowledgeable users. Our results show that AAR/AI participants not only located significantly more bugs than non-AAR/AI participants did (i.e., showed greater recall) but also located them more precisely (i.e., with greater precision). In fact, AAR/AI participants outperformed non-AAR/AI participants on every bug and were, on average, almost six times as likely as non-AAR/AI participants to find any particular bug. Finally, evidence suggests that incorporating labeling into the AAR/AI process may encourage domain-knowledgeable users to abstract above individual instances of bugs; we hypothesize that doing so may have contributed further to AAR/AI participants' effectiveness.
AB - Would you allow an AI agent to make decisions on your behalf? If the answer is "not always,"the next question becomes "in what circumstances"? Answering this question requires human users to be able to assess an AI agent- A nd not just with overall pass/fail assessments or statistics. Here users need to be able to localize an agent's bugs so that they can determine when they are willing to rely on the agent and when they are not. After-Action Review for AI (AAR/AI), a new AI assessment process for integration with Explainable AI systems, aims to support human users in this endeavor, and in this article we empirically investigate AAR/AI's effectiveness with domain-knowledgeable users. Our results show that AAR/AI participants not only located significantly more bugs than non-AAR/AI participants did (i.e., showed greater recall) but also located them more precisely (i.e., with greater precision). In fact, AAR/AI participants outperformed non-AAR/AI participants on every bug and were, on average, almost six times as likely as non-AAR/AI participants to find any particular bug. Finally, evidence suggests that incorporating labeling into the AAR/AI process may encourage domain-knowledgeable users to abstract above individual instances of bugs; we hypothesize that doing so may have contributed further to AAR/AI participants' effectiveness.
UR - http://www.scopus.com/inward/record.url?scp=85123751789&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123751789&partnerID=8YFLogxK
U2 - 10.1145/3487065
DO - 10.1145/3487065
M3 - Article
AN - SCOPUS:85123751789
SN - 2160-6455
VL - 12
JO - ACM Transactions on Interactive Intelligent Systems
JF - ACM Transactions on Interactive Intelligent Systems
IS - 1
M1 - 1
ER -