TY - GEN
T1 - InsectAgent
T2 - 28th IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2025
AU - Zhao, Shu
AU - Sridhar, Ajay Narayanan
AU - Patch, Harland
AU - Narayanan, Vijaykrishnan
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Insect recognition remains a critical challenge for biodiversity monitoring, conservation efforts, and agricultural sustainability. Current computer vision approaches struggle with accurate species identification due to subtle morphological differences. Our analysis reveals that while vision classifiers frequently fail to predict the correct species as their top choice, they consistently include the true species within top candidate predictions. This indicates that expert entomological knowledge is required to resolve ambiguities when vision classifiers fail. We present InsectAgent, a novel two-stage framework that enhances insect recognition through dynamic information augmentation using Multimodal Large Language Models (MLLMs). In the first stage, a vision classifier generates candidate species predictions with confidence scores. When confidence falls below a threshold, the second stage activates, retrieving relevant taxonomic knowledge from an expert knowledge base and invoking an MLLM for further analysis. This conditional MLLM invocation strategy significantly reduces computational costs by avoiding expensive model calls for high-confidence predictions while ensuring expertlevel reasoning for ambiguous cases. The information-augmented reasoning process combines visual cues with domain expertise, mirroring expert entomologists' workflow. Experimental results demonstrate that InsectAgent significantly outperforms standalone vision classifiers, achieving an average relative improvement of 14.24% in accuracy for insect identification tasks.
AB - Insect recognition remains a critical challenge for biodiversity monitoring, conservation efforts, and agricultural sustainability. Current computer vision approaches struggle with accurate species identification due to subtle morphological differences. Our analysis reveals that while vision classifiers frequently fail to predict the correct species as their top choice, they consistently include the true species within top candidate predictions. This indicates that expert entomological knowledge is required to resolve ambiguities when vision classifiers fail. We present InsectAgent, a novel two-stage framework that enhances insect recognition through dynamic information augmentation using Multimodal Large Language Models (MLLMs). In the first stage, a vision classifier generates candidate species predictions with confidence scores. When confidence falls below a threshold, the second stage activates, retrieving relevant taxonomic knowledge from an expert knowledge base and invoking an MLLM for further analysis. This conditional MLLM invocation strategy significantly reduces computational costs by avoiding expensive model calls for high-confidence predictions while ensuring expertlevel reasoning for ambiguous cases. The information-augmented reasoning process combines visual cues with domain expertise, mirroring expert entomologists' workflow. Experimental results demonstrate that InsectAgent significantly outperforms standalone vision classifiers, achieving an average relative improvement of 14.24% in accuracy for insect identification tasks.
UR - https://www.scopus.com/pages/publications/105016167635
UR - https://www.scopus.com/pages/publications/105016167635#tab=citedBy
U2 - 10.1109/ISVLSI65124.2025.11130343
DO - 10.1109/ISVLSI65124.2025.11130343
M3 - Conference contribution
AN - SCOPUS:105016167635
T3 - Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI
BT - IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2025 - Conference Proceedings
PB - IEEE Computer Society
Y2 - 6 July 2025 through 9 July 2025
ER -